Unsupervised Learning: Trade&Ahead¶

Marks: 60

Context¶

The stock market has consistently proven to be a good place to invest in and save for the future. There are a lot of compelling reasons to invest in stocks. It can help in fighting inflation, create wealth, and also provides some tax benefits. Good steady returns on investments over a long period of time can also grow a lot more than seems possible. Also, thanks to the power of compound interest, the earlier one starts investing, the larger the corpus one can have for retirement. Overall, investing in stocks can help meet life's financial aspirations.

It is important to maintain a diversified portfolio when investing in stocks in order to maximise earnings under any market condition. Having a diversified portfolio tends to yield higher returns and face lower risk by tempering potential losses when the market is down. It is often easy to get lost in a sea of financial metrics to analyze while determining the worth of a stock, and doing the same for a multitude of stocks to identify the right picks for an individual can be a tedious task. By doing a cluster analysis, one can identify stocks that exhibit similar characteristics and ones which exhibit minimum correlation. This will help investors better analyze stocks across different market segments and help protect against risks that could make the portfolio vulnerable to losses.

Objective¶

Trade&Ahead is a financial consultancy firm who provide their customers with personalized investment strategies. They have hired you as a Data Scientist and provided you with data comprising stock price and some financial indicators for a few companies listed under the New York Stock Exchange. They have assigned you the tasks of analyzing the data, grouping the stocks based on the attributes provided, and sharing insights about the characteristics of each group.

Data Dictionary¶

  • Ticker Symbol: An abbreviation used to uniquely identify publicly traded shares of a particular stock on a particular stock market
  • Company: Name of the company
  • GICS Sector: The specific economic sector assigned to a company by the Global Industry Classification Standard (GICS) that best defines its business operations
  • GICS Sub Industry: The specific sub-industry group assigned to a company by the Global Industry Classification Standard (GICS) that best defines its business operations
  • Current Price: Current stock price in dollars
  • Price Change: Percentage change in the stock price in 13 weeks
  • Volatility: Standard deviation of the stock price over the past 13 weeks
  • ROE: A measure of financial performance calculated by dividing net income by shareholders' equity (shareholders' equity is equal to a company's assets minus its debt)
  • Cash Ratio: The ratio of a company's total reserves of cash and cash equivalents to its total current liabilities
  • Net Cash Flow: The difference between a company's cash inflows and outflows (in dollars)
  • Net Income: Revenues minus expenses, interest, and taxes (in dollars)
  • Earnings Per Share: Company's net profit divided by the number of common shares it has outstanding (in dollars)
  • Estimated Shares Outstanding: Company's stock currently held by all its shareholders
  • P/E Ratio: Ratio of the company's current stock price to the earnings per share
  • P/B Ratio: Ratio of the company's stock price per share by its book value per share (book value of a company is the net difference between that company's total assets and total liabilities)

Importing necessary libraries and data¶

In [1]:
import numpy as np
import pandas as pd

#for visualizations
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns

#for missing value imputation
from sklearn.impute import SimpleImputer

#for scaling the data using z-score
from sklearn.preprocessing import StandardScaler
from scipy.spatial.distance import cdist, pdist
from scipy.stats import zscore

#for k-means clustering 
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
from yellowbrick.cluster import KElbowVisualizer, SilhouetteVisualizer

#for hierarchical clustering
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram, linkage, cophenet

import warnings
warnings.filterwarnings('ignore')

Data Overview¶

  • Observations
  • Sanity checks
In [2]:
#link google drive
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
In [3]:
data = pd.read_csv('/content/drive/MyDrive/Project7/stock_data.csv')
data1 = data.copy()
df = data.copy()

df.head()
Out[3]:
Ticker Symbol Security GICS Sector GICS Sub Industry Current Price Price Change Volatility ROE Cash Ratio Net Cash Flow Net Income Earnings Per Share Estimated Shares Outstanding P/E Ratio P/B Ratio
0 AAL American Airlines Group Industrials Airlines 42.349998 9.999995 1.687151 135 51 -604000000 7610000000 11.39 6.681299e+08 3.718174 -8.784219
1 ABBV AbbVie Health Care Pharmaceuticals 59.240002 8.339433 2.197887 130 77 51000000 5144000000 3.15 1.633016e+09 18.806350 -8.750068
2 ABT Abbott Laboratories Health Care Health Care Equipment 44.910000 11.301121 1.273646 21 67 938000000 4423000000 2.94 1.504422e+09 15.275510 -0.394171
3 ADBE Adobe Systems Inc Information Technology Application Software 93.940002 13.977195 1.357679 9 180 -240840000 629551000 1.26 4.996437e+08 74.555557 4.199651
4 ADI Analog Devices, Inc. Information Technology Semiconductors 55.320000 -1.827858 1.701169 14 272 315120000 696878000 0.31 2.247994e+09 178.451613 1.059810
In [4]:
df.shape
Out[4]:
(340, 15)

340 rows and 15 columns in the stock data.

In [5]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 340 entries, 0 to 339
Data columns (total 15 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Ticker Symbol                 340 non-null    object 
 1   Security                      340 non-null    object 
 2   GICS Sector                   340 non-null    object 
 3   GICS Sub Industry             340 non-null    object 
 4   Current Price                 340 non-null    float64
 5   Price Change                  340 non-null    float64
 6   Volatility                    340 non-null    float64
 7   ROE                           340 non-null    int64  
 8   Cash Ratio                    340 non-null    int64  
 9   Net Cash Flow                 340 non-null    int64  
 10  Net Income                    340 non-null    int64  
 11  Earnings Per Share            340 non-null    float64
 12  Estimated Shares Outstanding  340 non-null    float64
 13  P/E Ratio                     340 non-null    float64
 14  P/B Ratio                     340 non-null    float64
dtypes: float64(7), int64(4), object(4)
memory usage: 40.0+ KB

There are 4 columns with type object (Ticker symbol, Security, GICS Sector and GICS Sub industry. There are 11 numeric columns with 7 as float and 4 as int.

In [6]:
df.describe().T
Out[6]:
count mean std min 25% 50% 75% max
Current Price 340.0 8.086234e+01 9.805509e+01 4.500000e+00 3.855500e+01 5.970500e+01 9.288000e+01 1.274950e+03
Price Change 340.0 4.078194e+00 1.200634e+01 -4.712969e+01 -9.394838e-01 4.819505e+00 1.069549e+01 5.505168e+01
Volatility 340.0 1.525976e+00 5.917984e-01 7.331632e-01 1.134878e+00 1.385593e+00 1.695549e+00 4.580042e+00
ROE 340.0 3.959706e+01 9.654754e+01 1.000000e+00 9.750000e+00 1.500000e+01 2.700000e+01 9.170000e+02
Cash Ratio 340.0 7.002353e+01 9.042133e+01 0.000000e+00 1.800000e+01 4.700000e+01 9.900000e+01 9.580000e+02
Net Cash Flow 340.0 5.553762e+07 1.946365e+09 -1.120800e+10 -1.939065e+08 2.098000e+06 1.698108e+08 2.076400e+10
Net Income 340.0 1.494385e+09 3.940150e+09 -2.352800e+10 3.523012e+08 7.073360e+08 1.899000e+09 2.444200e+10
Earnings Per Share 340.0 2.776662e+00 6.587779e+00 -6.120000e+01 1.557500e+00 2.895000e+00 4.620000e+00 5.009000e+01
Estimated Shares Outstanding 340.0 5.770283e+08 8.458496e+08 2.767216e+07 1.588482e+08 3.096751e+08 5.731175e+08 6.159292e+09
P/E Ratio 340.0 3.261256e+01 4.434873e+01 2.935451e+00 1.504465e+01 2.081988e+01 3.176476e+01 5.280391e+02
P/B Ratio 340.0 -1.718249e+00 1.396691e+01 -7.611908e+01 -4.352056e+00 -1.067170e+00 3.917066e+00 1.290646e+02
In [7]:
df.isnull().sum()
Out[7]:
Ticker Symbol                   0
Security                        0
GICS Sector                     0
GICS Sub Industry               0
Current Price                   0
Price Change                    0
Volatility                      0
ROE                             0
Cash Ratio                      0
Net Cash Flow                   0
Net Income                      0
Earnings Per Share              0
Estimated Shares Outstanding    0
P/E Ratio                       0
P/B Ratio                       0
dtype: int64

There are no missing values.

In [8]:
df.duplicated().sum()
Out[8]:
0

There are no duplicates in the data set.

In [9]:
cols = df.columns

for col in cols:
  print("Unique values in {}" .format(col), df[col].unique() )
  print("---"*100)
Unique values in Ticker Symbol ['AAL' 'ABBV' 'ABT' 'ADBE' 'ADI' 'ADM' 'ADS' 'AEE' 'AEP' 'AFL' 'AIG' 'AIV'
 'AIZ' 'AJG' 'AKAM' 'ALB' 'ALK' 'ALL' 'ALLE' 'ALXN' 'AMAT' 'AME' 'AMG'
 'AMGN' 'AMP' 'AMT' 'AMZN' 'AN' 'ANTM' 'AON' 'APA' 'APC' 'APH' 'ARNC'
 'ATVI' 'AVB' 'AVGO' 'AWK' 'AXP' 'BA' 'BAC' 'BAX' 'BBT' 'BCR' 'BHI' 'BIIB'
 'BK' 'BLL' 'BMY' 'BSX' 'BWA' 'BXP' 'C' 'CAT' 'CB' 'CBG' 'CCI' 'CCL'
 'CELG' 'CF' 'CFG' 'CHD' 'CHK' 'CHRW' 'CHTR' 'CI' 'CINF' 'CL' 'CMA' 'CME'
 'CMG' 'CMI' 'CMS' 'CNC' 'CNP' 'COF' 'COG' 'COO' 'CSX' 'CTL' 'CTSH' 'CTXS'
 'CVS' 'CVX' 'CXO' 'D' 'DAL' 'DD' 'DE' 'DFS' 'DGX' 'DHR' 'DIS' 'DISCA'
 'DISCK' 'DLPH' 'DLR' 'DNB' 'DOV' 'DPS' 'DUK' 'DVA' 'DVN' 'EBAY' 'ECL'
 'ED' 'EFX' 'EIX' 'EMN' 'EOG' 'EQIX' 'EQR' 'EQT' 'ES' 'ESS' 'ETFC' 'ETN'
 'ETR' 'EW' 'EXC' 'EXPD' 'EXPE' 'EXR' 'F' 'FAST' 'FB' 'FBHS' 'FCX' 'FE'
 'FIS' 'FISV' 'FLIR' 'FLR' 'FLS' 'FMC' 'FRT' 'FSLR' 'FTR' 'GD' 'GGP'
 'GILD' 'GLW' 'GM' 'GPC' 'GRMN' 'GT' 'GWW' 'HAL' 'HAS' 'HBAN' 'HCA' 'HCN'
 'HCP' 'HES' 'HIG' 'HOG' 'HON' 'HPE' 'HPQ' 'HRL' 'HSIC' 'HST' 'HSY' 'HUM'
 'IBM' 'IDXX' 'IFF' 'INTC' 'IP' 'IPG' 'IRM' 'ISRG' 'ITW' 'IVZ' 'JBHT'
 'JEC' 'JNPR' 'JPM' 'KIM' 'KMB' 'KMI' 'KO' 'KSU' 'LEG' 'LEN' 'LH' 'LKQ'
 'LLL' 'LLY' 'LMT' 'LNT' 'LUK' 'LUV' 'LVLT' 'LYB' 'MA' 'MAA' 'MAC' 'MAR'
 'MAS' 'MAT' 'MCD' 'MCO' 'MDLZ' 'MET' 'MHK' 'MJN' 'MKC' 'MLM' 'MMC' 'MMM'
 'MNST' 'MO' 'MOS' 'MPC' 'MRK' 'MRO' 'MTB' 'MTD' 'MUR' 'MYL' 'NAVI' 'NBL'
 'NDAQ' 'NEE' 'NEM' 'NFLX' 'NFX' 'NLSN' 'NOV' 'NSC' 'NTRS' 'NUE' 'NWL' 'O'
 'OKE' 'OMC' 'ORLY' 'OXY' 'PBCT' 'PBI' 'PCAR' 'PCG' 'PCLN' 'PEG' 'PEP'
 'PFE' 'PFG' 'PG' 'PGR' 'PHM' 'PM' 'PNC' 'PNR' 'PNW' 'PPG' 'PPL' 'PRU'
 'PSX' 'PWR' 'PX' 'PYPL' 'R' 'RCL' 'REGN' 'RHI' 'ROP' 'RRC' 'RSG' 'SCG'
 'SCHW' 'SE' 'SEE' 'SHW' 'SLG' 'SNI' 'SO' 'SPG' 'SPGI' 'SRCL' 'SRE' 'STI'
 'STT' 'SWKS' 'SWN' 'SYF' 'SYK' 'T' 'TAP' 'TDC' 'TGNA' 'TMK' 'TMO' 'TRIP'
 'TRV' 'TSCO' 'TSN' 'TSO' 'TSS' 'TXN' 'UAA' 'UAL' 'UDR' 'UHS' 'UNH' 'UNM'
 'UNP' 'UPS' 'UTX' 'VAR' 'VLO' 'VMC' 'VNO' 'VRSK' 'VRSN' 'VRTX' 'VTR' 'VZ'
 'WAT' 'WEC' 'WFC' 'WHR' 'WM' 'WMB' 'WU' 'WY' 'WYN' 'WYNN' 'XEC' 'XEL'
 'XL' 'XOM' 'XRAY' 'XRX' 'XYL' 'YHOO' 'YUM' 'ZBH' 'ZION' 'ZTS']
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique values in Security ['American Airlines Group' 'AbbVie' 'Abbott Laboratories'
 'Adobe Systems Inc' 'Analog Devices, Inc.' 'Archer-Daniels-Midland Co'
 'Alliance Data Systems' 'Ameren Corp' 'American Electric Power'
 'AFLAC Inc' 'American International Group, Inc.'
 'Apartment Investment & Mgmt' 'Assurant Inc' 'Arthur J. Gallagher & Co.'
 'Akamai Technologies Inc' 'Albemarle Corp' 'Alaska Air Group Inc'
 'Allstate Corp' 'Allegion' 'Alexion Pharmaceuticals'
 'Applied Materials Inc' 'AMETEK Inc' 'Affiliated Managers Group Inc'
 'Amgen Inc' 'Ameriprise Financial' 'American Tower Corp A'
 'Amazon.com Inc' 'AutoNation Inc' 'Anthem Inc.' 'Aon plc'
 'Apache Corporation' 'Anadarko Petroleum Corp' 'Amphenol Corp'
 'Arconic Inc' 'Activision Blizzard' 'AvalonBay Communities, Inc.'
 'Broadcom' 'American Water Works Company Inc' 'American Express Co'
 'Boeing Company' 'Bank of America Corp' 'Baxter International Inc.'
 'BB&T Corporation' 'Bard (C.R.) Inc.' 'Baker Hughes Inc'
 'BIOGEN IDEC Inc.' 'The Bank of New York Mellon Corp.' 'Ball Corp'
 'Bristol-Myers Squibb' 'Boston Scientific' 'BorgWarner'
 'Boston Properties' 'Citigroup Inc.' 'Caterpillar Inc.' 'Chubb Limited'
 'CBRE Group' 'Crown Castle International Corp.' 'Carnival Corp.'
 'Celgene Corp.' 'CF Industries Holdings Inc' 'Citizens Financial Group'
 'Church & Dwight' 'Chesapeake Energy' 'C. H. Robinson Worldwide'
 'Charter Communications' 'CIGNA Corp.' 'Cincinnati Financial'
 'Colgate-Palmolive' 'Comerica Inc.' 'CME Group Inc.'
 'Chipotle Mexican Grill' 'Cummins Inc.' 'CMS Energy'
 'Centene Corporation' 'CenterPoint Energy' 'Capital One Financial'
 'Cabot Oil & Gas' 'The Cooper Companies' 'CSX Corp.' 'CenturyLink Inc'
 'Cognizant Technology Solutions' 'Citrix Systems' 'CVS Health'
 'Chevron Corp.' 'Concho Resources' 'Dominion Resources' 'Delta Air Lines'
 'Du Pont (E.I.)' 'Deere & Co.' 'Discover Financial Services'
 'Quest Diagnostics' 'Danaher Corp.' 'The Walt Disney Company'
 'Discovery Communications-A' 'Discovery Communications-C'
 'Delphi Automotive' 'Digital Realty Trust' 'Dun & Bradstreet'
 'Dover Corp.' 'Dr Pepper Snapple Group' 'Duke Energy' 'DaVita Inc.'
 'Devon Energy Corp.' 'eBay Inc.' 'Ecolab Inc.' 'Consolidated Edison'
 'Equifax Inc.' "Edison Int'l" 'Eastman Chemical' 'EOG Resources'
 'Equinix' 'Equity Residential' 'EQT Corporation' 'Eversource Energy'
 'Essex Property Trust, Inc.' 'E*Trade' 'Eaton Corporation'
 'Entergy Corp.' 'Edwards Lifesciences' 'Exelon Corp.' "Expeditors Int'l"
 'Expedia Inc.' 'Extra Space Storage' 'Ford Motor' 'Fastenal Co'
 'Facebook' 'Fortune Brands Home & Security' 'Freeport-McMoran Cp & Gld'
 'FirstEnergy Corp' 'Fidelity National Information Services' 'Fiserv Inc'
 'FLIR Systems' 'Fluor Corp.' 'Flowserve Corporation' 'FMC Corporation'
 'Federal Realty Investment Trust' 'First Solar Inc'
 'Frontier Communications' 'General Dynamics'
 'General Growth Properties Inc.' 'Gilead Sciences' 'Corning Inc.'
 'General Motors' 'Genuine Parts' 'Garmin Ltd.' 'Goodyear Tire & Rubber'
 'Grainger (W.W.) Inc.' 'Halliburton Co.' 'Hasbro Inc.'
 'Huntington Bancshares' 'HCA Holdings' 'Welltower Inc.' 'HCP Inc.'
 'Hess Corporation' 'Hartford Financial Svc.Gp.' 'Harley-Davidson'
 "Honeywell Int'l Inc." 'Hewlett Packard Enterprise' 'HP Inc.'
 'Hormel Foods Corp.' 'Henry Schein' 'Host Hotels & Resorts'
 'The Hershey Company' 'Humana Inc.' 'International Business Machines'
 'IDEXX Laboratories' 'Intl Flavors & Fragrances' 'Intel Corp.'
 'International Paper' 'Interpublic Group' 'Iron Mountain Incorporated'
 'Intuitive Surgical Inc.' 'Illinois Tool Works' 'Invesco Ltd.'
 'J. B. Hunt Transport Services' 'Jacobs Engineering Group'
 'Juniper Networks' 'JPMorgan Chase & Co.' 'Kimco Realty' 'Kimberly-Clark'
 'Kinder Morgan' 'Coca Cola Company' 'Kansas City Southern'
 'Leggett & Platt' 'Lennar Corp.' 'Laboratory Corp. of America Holding'
 'LKQ Corporation' 'L-3 Communications Holdings' 'Lilly (Eli) & Co.'
 'Lockheed Martin Corp.' 'Alliant Energy Corp' 'Leucadia National Corp.'
 'Southwest Airlines' 'Level 3 Communications' 'LyondellBasell'
 'Mastercard Inc.' 'Mid-America Apartments' 'Macerich' "Marriott Int'l."
 'Masco Corp.' 'Mattel Inc.' "McDonald's Corp." "Moody's Corp"
 'Mondelez International' 'MetLife Inc.' 'Mohawk Industries'
 'Mead Johnson' 'McCormick & Co.' 'Martin Marietta Materials'
 'Marsh & McLennan' '3M Company' 'Monster Beverage' 'Altria Group Inc'
 'The Mosaic Company' 'Marathon Petroleum' 'Merck & Co.'
 'Marathon Oil Corp.' 'M&T Bank Corp.' 'Mettler Toledo' 'Murphy Oil'
 'Mylan N.V.' 'Navient' 'Noble Energy Inc' 'NASDAQ OMX Group'
 'NextEra Energy' 'Newmont Mining Corp. (Hldg. Co.)' 'Netflix Inc.'
 'Newfield Exploration Co' 'Nielsen Holdings'
 'National Oilwell Varco Inc.' 'Norfolk Southern Corp.'
 'Northern Trust Corp.' 'Nucor Corp.' 'Newell Brands'
 'Realty Income Corporation' 'ONEOK' 'Omnicom Group' "O'Reilly Automotive"
 'Occidental Petroleum' "People's United Financial" 'Pitney-Bowes'
 'PACCAR Inc.' 'PG&E Corp.' 'Priceline.com Inc'
 'Public Serv. Enterprise Inc.' 'PepsiCo Inc.' 'Pfizer Inc.'
 'Principal Financial Group' 'Procter & Gamble' 'Progressive Corp.'
 'Pulte Homes Inc.' 'Philip Morris International' 'PNC Financial Services'
 'Pentair Ltd.' 'Pinnacle West Capital' 'PPG Industries' 'PPL Corp.'
 'Prudential Financial' 'Phillips 66' 'Quanta Services Inc.'
 'Praxair Inc.' 'PayPal' 'Ryder System' 'Royal Caribbean Cruises Ltd'
 'Regeneron' 'Robert Half International' 'Roper Industries'
 'Range Resources Corp.' 'Republic Services Inc' 'SCANA Corp'
 'Charles Schwab Corporation' 'Spectra Energy Corp.' 'Sealed Air'
 'Sherwin-Williams' 'SL Green Realty' 'Scripps Networks Interactive Inc.'
 'Southern Co.' 'Simon Property Group Inc' 'S&P Global, Inc.'
 'Stericycle Inc' 'Sempra Energy' 'SunTrust Banks' 'State Street Corp.'
 'Skyworks Solutions' 'Southwestern Energy' 'Synchrony Financial'
 'Stryker Corp.' 'AT&T Inc' 'Molson Coors Brewing Company'
 'Teradata Corp.' 'Tegna, Inc.' 'Torchmark Corp.'
 'Thermo Fisher Scientific' 'TripAdvisor' 'The Travelers Companies Inc.'
 'Tractor Supply Company' 'Tyson Foods' 'Tesoro Petroleum Co.'
 'Total System Services' 'Texas Instruments' 'Under Armour'
 'United Continental Holdings' 'UDR Inc' 'Universal Health Services, Inc.'
 'United Health Group Inc.' 'Unum Group' 'Union Pacific'
 'United Parcel Service' 'United Technologies' 'Varian Medical Systems'
 'Valero Energy' 'Vulcan Materials' 'Vornado Realty Trust'
 'Verisk Analytics' 'Verisign Inc.' 'Vertex Pharmaceuticals Inc'
 'Ventas Inc' 'Verizon Communications' 'Waters Corporation'
 'Wec Energy Group Inc' 'Wells Fargo' 'Whirlpool Corp.'
 'Waste Management Inc.' 'Williams Cos.' 'Western Union Co'
 'Weyerhaeuser Corp.' 'Wyndham Worldwide' 'Wynn Resorts Ltd'
 'Cimarex Energy' 'Xcel Energy Inc' 'XL Capital' 'Exxon Mobil Corp.'
 'Dentsply Sirona' 'Xerox Corp.' 'Xylem Inc.' 'Yahoo Inc.'
 'Yum! Brands Inc' 'Zimmer Biomet Holdings' 'Zions Bancorp' 'Zoetis']
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique values in GICS Sector ['Industrials' 'Health Care' 'Information Technology' 'Consumer Staples'
 'Utilities' 'Financials' 'Real Estate' 'Materials'
 'Consumer Discretionary' 'Energy' 'Telecommunications Services']
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique values in GICS Sub Industry ['Airlines' 'Pharmaceuticals' 'Health Care Equipment'
 'Application Software' 'Semiconductors' 'Agricultural Products'
 'Data Processing & Outsourced Services' 'MultiUtilities'
 'Electric Utilities' 'Life & Health Insurance'
 'Property & Casualty Insurance' 'REITs' 'Multi-line Insurance'
 'Insurance Brokers' 'Internet Software & Services' 'Specialty Chemicals'
 'Building Products' 'Biotechnology' 'Semiconductor Equipment'
 'Electrical Components & Equipment' 'Asset Management & Custody Banks'
 'Specialized REITs' 'Internet & Direct Marketing Retail'
 'Specialty Stores' 'Managed Health Care'
 'Oil & Gas Exploration & Production' 'Electronic Components'
 'Aerospace & Defense' 'Home Entertainment Software' 'Residential REITs'
 'Water Utilities' 'Consumer Finance' 'Banks'
 'Oil & Gas Equipment & Services' 'Metal & Glass Containers'
 'Health Care Distributors' 'Auto Parts & Equipment'
 'Construction & Farm Machinery & Heavy Trucks' 'Real Estate Services'
 'Hotels, Resorts & Cruise Lines' 'Fertilizers & Agricultural Chemicals'
 'Regional Banks' 'Household Products' 'Integrated Oil & Gas'
 'Air Freight & Logistics' 'Cable & Satellite'
 'Financial Exchanges & Data' 'Restaurants' 'Industrial Machinery'
 'Health Care Supplies' 'Railroads'
 'Integrated Telecommunications Services' 'IT Consulting & Other Services'
 'Drug Retail' 'Diversified Chemicals' 'Health Care Facilities'
 'Industrial Conglomerates' 'Broadcasting & Cable TV'
 'Research & Consulting Services' 'Soft Drinks'
 'Investment Banking & Brokerage' 'Automobile Manufacturers' 'Copper'
 'Electronic Equipment & Instruments' 'Diversified Commercial Services'
 'Retail REITs' 'Consumer Electronics' 'Tires & Rubber'
 'Industrial Materials' 'Leisure Products' 'Motorcycle Manufacturers'
 'Technology Hardware, Storage & Peripherals' 'Computer Hardware'
 'Packaged Foods & Meats' 'Paper Packaging' 'Advertising' 'Trucking'
 'Networking Equipment' 'Oil & Gas Refining & Marketing & Transportation'
 'Homebuilding' 'Distributors' 'Multi-Sector Holdings'
 'Alternative Carriers' 'Diversified Financial Services'
 'Home Furnishings' 'Construction Materials' 'Tobacco'
 'Life Sciences Tools & Services' 'Gold' 'Steel'
 'Housewares & Specialties' 'Thrifts & Mortgage Finance'
 'Technology, Hardware, Software and Supplies' 'Personal Products'
 'Industrial Gases' 'Human Resource & Employment Services' 'Office REITs'
 'Brewers' 'Publishing' 'Specialty Retail'
 'Apparel, Accessories & Luxury Goods' 'Household Appliances'
 'Environmental Services' 'Casinos & Gaming']
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique values in Current Price [  42.349998     59.240002     44.91         93.940002     55.32
   36.68        276.570007     43.23         58.27         59.900002
   61.970001     40.029999     80.540001     40.939999     52.630001
   56.009998     80.510002     62.09         65.919998    190.75
   18.67         53.59        159.759995    162.330002    106.419998
   96.949997    675.890015     59.66        139.440002     92.209999
   44.470001     48.580002     52.23          7.3988066    38.709999
  184.130005    145.149994     59.75         69.550003    144.589996
   16.83         38.150002     37.810001    189.440002     46.150002
  306.350006     41.220001     72.730003     68.790001     18.440001
  127.540001     51.75         67.959999    116.849998     34.580002
   86.449997     54.48        119.760002     40.810001     26.190001
   42.4399985     4.5          62.02        183.100006    146.330002
   59.169998     66.620003     41.830002     90.599998    479.850006
   88.010002     36.080002     65.809998     18.360001     72.18
   17.690001    134.199997     25.950001     25.16         60.02
   75.650002     97.769997     89.959999     92.860001     67.639999
   50.689999     66.599998     76.269997     53.619999     71.139999
   70.41698484  105.080002     26.68         25.219999     85.730003
   75.620003    103.93         61.310001     93.199997     71.389999
   69.709999     32.           27.48        114.379997     64.269997
  111.370003     59.209999     67.510002     70.790001    302.399994
   81.589996     52.130001     51.07        239.410004     29.639999
   52.040001     68.360001     78.980003     27.77         45.099998
  124.300003     88.209999     14.09         40.82        104.660004
   55.5           6.77         31.73         60.599998     91.459999
   28.07         47.220001     42.080002     39.130001    146.100006
   65.989998      4.67        137.360001     27.209999    101.190002
   18.280001     34.009998     85.889999     37.169998     32.669998
  202.589996     34.040001     67.360001     11.06         67.629997
   68.029999     34.82695902   48.48         43.459999     45.389999
  103.57         15.2          11.84         39.540001    158.190002
   15.34         89.269997    178.509995    137.619995     72.919998
  119.639999     34.450001     37.700001     23.280001     27.01
  546.159973     92.68         33.48         73.360001     41.950001
   27.6          66.029999     26.459999    127.300003     14.92
   42.959999     74.669998     42.02         48.91        123.639999
   29.629999    119.510002     84.260002    217.149994     31.2250005
   17.389999     43.060001     54.360001     86.900002     97.360001
   90.809998     80.690002     67.040001     28.299999     27.17
  118.139999    100.339996     44.84         48.209999    189.389999
   78.949997     85.559998    136.580002     55.450001    150.639999
   49.65333167   58.209999     27.59         51.84         52.82
   12.59        121.18        339.130005     22.450001     54.07
   11.45         32.93         58.169998    103.889999     17.99
   32.560001     46.599998     33.490002     84.589996     72.089996
   40.299999     44.080002     51.630001     24.66         75.660004
  253.419998     67.610001     16.15         20.65         47.400002
   53.189999   1274.949951     38.689999     99.919998     32.279999
   44.98         79.410004     31.799999     17.82         87.910004
   95.309998     49.529999     64.480003     98.82         34.130001
   81.410004     81.800003     20.25        102.400002     36.200001
   56.830002    101.209999    542.869995     47.139999    189.789993
   24.610001     43.990002     60.490002     23.940001     44.599998
  259.600006    112.980003     55.209999     46.790001    194.440002
   98.580002    120.599998     94.010002     42.84         66.360001
   76.830002      7.11         30.41         92.940002     34.41
   93.919998     26.42         25.52         57.16        141.850006
   85.25        112.860001     85.5          53.330002    105.370003
   49.799999     54.810001     80.610001     57.299999     37.57
  119.489998    117.639999     33.290001     78.199997     96.230003
   96.07         80.800003     70.709999     94.970001     99.959999
   76.879997     87.360001    125.830002     56.43         46.220001
  134.580002     51.310001    146.869995     53.369999     25.700001
   17.91         29.98         72.650002     69.190002     89.379997
   35.91         39.18         77.949997     60.849998     10.63
   36.5          33.259998     52.51617541  102.589996     27.299999
   47.919998  ]
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique values in Price Change [ 9.99999481e+00  8.33943306e+00  1.13011208e+01  1.39771952e+01
 -1.82785810e+00 -1.20172682e+01  6.18928557e+00  2.17442444e+00
  2.37175342e+00  3.02718100e+00  8.35810821e+00  7.57860810e+00
  1.89777325e+00 -6.06943448e-01 -2.37909028e+01  2.64619479e+01
  2.06643644e+00  6.59227468e+00  1.37532301e+01  2.23383802e+01
  2.68342391e+01  2.21247377e+00 -6.61133544e+00  1.71634778e+01
 -2.42068591e+00  1.02330873e+01  3.22681047e+01  2.35031562e+00
 -6.20052749e-01  3.91030097e+00  1.13978037e+01 -2.08020835e+01
  2.69366688e+00  1.64778443e+00  2.33195293e+01  4.85763024e+00
  1.79026828e+01  8.59687386e+00 -6.21629012e+00  1.01050779e+01
  8.44072165e+00  1.67023651e+01  5.94004500e+00  1.54918196e+00
 -1.23123672e+01  4.91798229e+00  5.42200283e+00  1.65358164e+01
  1.60816796e+01  1.17575818e+01  3.47056255e+00  7.20349662e+00
  4.71469465e+00  3.55020891e+00  1.31938338e+01  8.19775683e+00
  9.56906820e+00  8.93821671e+00  8.44879290e+00 -9.25061131e+00
  1.02736884e+01  1.04761548e+00 -3.81017882e+01 -9.00822130e+00
  3.59850674e+00  8.68241465e+00  9.77735771e+00  4.78137921e+00
  1.90012916e+00 -2.40224491e+00 -3.31312678e+01 -1.88847908e+01
  1.94971184e+00  2.17125911e+01  1.43646961e+00 -6.19574582e-01
 -2.00993595e+01 -9.67221466e+00 -4.34942146e+00  1.59231682e-01
 -4.65448920e+00  9.02147729e+00  1.32656133e+00  1.28449547e+01
 -2.74403217e+00 -3.98864176e+00  1.33750842e+01  3.74896767e+01
  3.95256083e+00  3.65358399e+00  1.56747951e+01  8.92459471e+00
  2.04914148e+00  2.02676864e+00  3.57289117e+00  1.21093264e+01
  1.55739004e+01 -1.18843887e+00  6.97958459e+00  1.80493990e+01
 -8.33447724e-01 -3.62229079e+00 -1.54780794e+01  1.21632653e+01
  3.78368391e+00 -3.97430599e+00  1.45310626e+01 -6.13507114e+00
  3.65423785e+00 -4.07859333e+00  1.00196502e+01  8.03760493e+00
 -2.12537714e+01  7.09921134e-01  6.76507254e+00  1.26567850e+01
  1.16641138e+00  4.91098343e+00  1.16167337e+01 -6.40377486e+00
 -4.44915880e+00  4.89451730e+00  1.39222511e+01  2.39825581e+00
  1.09842336e+01  1.62243204e+01  1.68175170e+01 -3.16851665e+01
  1.17984371e+00 -1.05535085e+01  5.23529489e+00  2.14209211e-01
  1.08190563e+01  2.21035716e+00  1.50882382e+01  6.80606293e+00
  5.50516834e+01 -2.30125523e+00 -4.63767391e-01  4.21293741e+00
  2.68926423e+00  6.58892711e+00  1.22812706e+01  4.03343154e+00
  3.39359379e+00  1.04462407e+01 -5.33619890e+00 -5.10175091e+00
 -7.07683424e+00  4.14312618e+00 -1.25323370e+01  4.41161760e-02
  2.21865805e+00 -4.58571335e+00 -5.00546667e+00 -1.72470362e+01
  9.32024719e+00 -1.78378378e+01  2.16175949e+00  2.44962248e+01
  1.83171321e+01 -3.21766562e+00 -3.26181408e+00 -1.45443304e-01
 -5.29213620e+00 -1.56588009e+00  1.49610829e+01  1.40350948e+01
 -2.65128620e-02  2.18210350e+01 -1.30672675e+01  1.87330126e+01
  1.27768314e+01  7.06747682e+00  2.96140491e+00  1.15394839e+01
  7.35122938e+00  8.03337710e+00  8.70993837e+00  1.75113086e+01
 -4.71296934e+01  6.81252594e+00 -1.84380169e+01  1.96554482e+00
  1.70513620e+00  1.41748988e+01  4.44130404e+00  1.45390134e+01
  7.89478488e-01  5.25422719e+00  6.64275945e+00 -1.42927642e+01
  1.38551058e+01  2.47075040e+01  2.57318340e+00  7.49696478e+00
  1.06224905e+01  4.18335071e+00 -1.97396991e+00  1.16370769e+01
  3.06250063e+01  1.99390853e+01  2.34597611e+00  6.07996215e+00
  1.36669047e+00  3.51442488e+00  1.20811964e+01  6.97673767e+00
 -1.08660148e+01  6.02294849e+00  5.92784726e+00  1.08003574e+01
  6.88578786e+00 -1.12290862e+01  1.15078463e+01  7.03141265e+00
 -2.02659911e+01 -3.61785059e-01  1.89429051e+01 -8.59119742e+00
  3.31773465e+01  1.86832740e+00  7.29879090e+00  8.81032377e+00
  6.23785452e+00  1.08441158e+01  1.11456540e+01 -3.29669458e+00
  4.93131727e+00 -1.25587392e+01  9.52996596e+00  5.79688444e+00
  6.58555391e+00  9.98003942e+00  8.42083596e+00 -2.41230769e+01
  1.48103212e+01  9.64142629e-01  8.65287198e-01  3.12899106e+00
  3.82102080e+00 -9.31700402e+00  5.10205991e-01  3.19052723e+00
 -8.23055266e+00  6.07218809e+00  3.13099052e+00 -5.30526316e+00
  1.06605376e+01  3.51562511e+00 -5.56439292e+00  1.03288203e+01
  6.99370887e+00 -3.03446151e+00  4.98751528e-01  1.91607380e+01
  3.42424545e+00  6.58550301e+00  5.37164261e+00 -1.66323624e+01
  2.93833502e-01  1.74562005e+01 -2.32441907e+01  1.34259729e+01
  1.69953198e+01 -7.65915784e+00  2.04327672e+01 -2.51065117e+01
  6.74594290e+00  7.23276369e+00  1.54628331e+01 -9.89837787e+00
 -5.14675032e+00  1.65379834e+01  4.00442430e+00  1.22382601e+01
  4.37206985e+00  5.28482205e+00  1.40444235e+01 -1.39063419e+01
 -2.79184886e+00  1.19707325e+01 -8.66449033e-01 -8.51393278e+00
 -4.47981366e+01 -2.87447789e+00 -1.65079153e+00  5.94211823e+00
  1.31293681e+01 -8.83367840e+00  1.36242259e+01  1.16814159e+00
  1.56071797e+01  3.48039173e+01  1.30295476e+01  1.30331514e+00
  2.32493691e+01  8.58409101e+00  9.23447905e+00  9.97191212e+00
 -1.69482767e+01  8.21529352e+00  8.58382131e+00 -5.13655212e+00
  1.46627305e+00  3.80418148e+00 -1.23711354e+01 -2.79797677e+00
  8.06523941e+00  9.24824783e+00  1.73415223e+01  6.02880540e+00
  1.00275192e+01 -1.44853861e+00  2.34595796e+01  2.19283001e+01
  2.13104241e-01  6.27730254e+00  1.39253411e+01 -1.98662281e+00
  5.53291227e+00 -2.30970711e-01  7.06118584e+00 -3.09881858e+01
 -2.61010890e+00  8.54452902e+00  1.00097595e+00  2.94965413e+01
 -1.44033722e+01  1.38340493e+00  7.69653360e+00  3.65691504e+00
  1.99014739e+01  9.47476828e+00  1.10097290e+01  1.48877266e+01
 -8.69891720e+00  9.34768280e+00 -1.15858794e+00  1.66788361e+01]
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique values in Volatility [1.68715106 2.19788722 1.27364601 1.35767892 1.70116879 1.51649264
 1.11697633 1.12418643 1.06848509 1.04829468 1.10696539 1.16333364
 1.11260424 1.05205031 1.38450167 1.97432252 1.77343079 1.05326602
 1.28379478 2.02292079 1.46003013 1.08926624 2.09306542 1.6302589
 1.22225951 1.16580402 1.46038649 1.4809144  1.51165436 1.10503221
 2.40540795 2.43516482 1.00776167 2.59206524 1.88633452 1.13287498
 1.84717959 1.17152528 0.9000662  1.15590486 1.41868781 1.20452559
 1.07767848 1.39443569 2.55955343 1.82599372 1.20166021 1.38668418
 1.49887213 1.49176424 2.0587686  1.08946885 1.26198423 1.49355294
 0.94484686 1.29785712 0.96019059 1.34723905 2.00082766 2.36818555
 1.18923614 0.92902624 4.55981453 1.18547264 1.697942   1.58839801
 0.93581223 0.89547081 1.55765486 1.32334793 2.47400169 1.47236379
 1.03784428 2.29869617 1.38986704 1.36459176 3.05581758 1.55505705
 1.62621905 1.52219355 1.33812318 1.96886384 1.48736738 1.75065524
 2.69254622 0.88993126 1.44421868 1.57788128 1.55194552 1.15989696
 1.38148979 1.19146592 1.18845397 1.6892346  1.81214432 1.44088445
 1.07040583 1.33792353 1.50756854 1.15079692 1.09672735 1.2116427
 2.92369768 1.40930182 1.07851567 1.0680017  1.08104026 0.92725976
 1.40450816 1.94110389 1.30808151 1.05618603 2.36488263 1.23282885
 1.11842457 1.4520483  1.52142989 1.21740068 1.66648249 1.35159454
 1.0625527  1.57874658 1.18605933 1.15145372 1.41139553 1.32060613
 1.34829666 3.79641004 1.23878468 1.14829451 0.90448658 1.76119277
 1.77445407 1.78166051 2.17573838 1.23985802 2.07521591 2.02681808
 0.93954369 1.39034206 1.49406049 1.57848271 1.34451442 1.17702685
 1.66547459 1.52277794 1.34859719 1.9660615  1.58335507 1.33779254
 1.91490726 1.34173146 1.28228628 2.3985804  1.14733221 1.56037194
 1.10344894 3.40049106 2.37335889 1.07845549 1.01392243 1.59462774
 1.18838275 1.61520603 1.08288064 1.46958571 1.15285453 1.22602189
 1.3016302  1.13979942 1.30138165 1.12600938 1.14286861 1.58083869
 1.21837263 1.73299014 1.84176674 1.13033734 1.22468825 0.87040465
 3.13935157 0.88991256 2.07163924 1.20403738 1.56916714 1.60312963
 1.42723709 1.51343434 1.44062152 0.90309766 1.11584212 1.5542353
 1.53629044 1.45701312 1.6097448  1.09587575 1.17777601 1.16932806
 1.64244991 1.4283592  1.921708   0.73316318 1.26879979 1.32154764
 1.13865048 1.49247842 1.71840329 1.03222106 2.16414979 1.03416246
 0.98269841 1.58594449 0.95900798 2.83067523 1.98937101 1.27846009
 3.32538642 1.38038978 1.11537571 2.85118007 2.29930409 2.23082675
 2.50943706 1.56325826 1.02337505 2.53605034 2.60594879 2.42152915
 1.19849259 1.95202036 2.1688136  1.28156631 1.46061929 1.64129973
 1.10458128 3.56017767 1.0663693  1.08937038 1.58951974 1.1328129
 1.25961075 1.4395637  1.03980326 1.26834034 1.18066131 0.80535713
 1.2387481  1.52898492 0.80605597 1.08689787 1.69475119 0.86145337
 1.12053368 1.8759105  1.14342144 1.5330026  1.10905888 1.22746706
 1.37958859 2.95429144 1.13123977 1.92575419 1.94596571 1.5565117
 1.80234528 1.14237004 1.05880661 3.71299533 0.83982097 1.26623967
 1.45693998 2.03078573 1.58011699 1.42648779 1.09196684 1.7738653
 0.8950593  1.13554603 1.08085768 1.20381569 1.12644779 1.43793764
 1.44464394 2.01739436 4.58004173 1.83502828 1.13816285 0.85944183
 1.21780338 2.73065911 1.79726923 1.02296759 1.24775112 1.57834356
 0.95936484 1.43110855 1.58671931 1.85413193 1.57924824 1.26347939
 1.75882399 1.74760571 1.15790642 2.0486974  1.48234857 1.10284754
 1.43029672 0.82640811 0.9493961  1.0348433  1.6269339  1.84570997
 1.0197243  1.45401865 1.37947984 2.45653493 1.4449241  0.8425918
 1.04461471 1.10303264 0.96977356 2.39780299 0.94036597 3.71955968
 1.27305085 1.33806702 1.33191818 3.79478323 2.3979403  1.0150524
 0.99101052 1.37006187 1.00723035 1.86668025 1.16631103 1.84514878
 1.47887743 1.40420566 1.46817588 1.61028462]
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique values in ROE [135 130  21   9  14  10  30  11   2  15   3  35 601  18  25  22   4  19
  23 917  52  24   8  29  82   6  12  38  17  20   7  27  16 687  44 589
 463   1  42  64 205  26   5  13 155  28  98  34  41  51  92 228  36  33
 582 116  68  63 263 167 103 182  61  43 244  73  45  47  32 121  40  48
 596 200 196  37  59 109  60 174  86 142]
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique values in Cash Ratio [ 51  77  67 180 272  49  25  14   9  99  47 225  13  74  45 195 131  37
 362  39  58   1  70  80  22 175 163   4  24 128  82  84 133  10  53  12
  36  20 333  38   0  27 237  48  43   3 182  52  11   8  31  60  26  79
 271   2  15 164 201  44 257  94  29  35 958   5  18  81  73 190 496 148
  33 121  30  16 189  92 103  41  40 162 317 130  23 108   7  42  54  61
  46 260 183  17 136 568  62  71  57 117   6 198  65  21 147  64  34 110
 184  68 129  19 116  88 212 115 126  56 127 221 425  83 459 100]
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique values in Net Cash Flow [  -604000000     51000000    938000000   -240840000    315120000
   -189000000     90885000    287000000     13900000   -308000000
   -129000000     21818000    -30351000    166000000     50823000
  -2276034000    -34000000   -162000000    -90800000     66000000
   1795000000      3390000     13200000    413000000   -281000000
      7194000   1333000000     -1300000    -38200000     10000000
    698000000  -6430000000    768300000     42000000  -3025000000
   -108953000    218000000     22000000    474000000   -431000000
  20764000000   -712000000   1386000000     -9600000    584000000
    148900000   -433000000     32600000  -3186000000   -268000000
   -220100000  -1039361000 -11208000000   -881000000   1120000000
   -200481000      3190000   1064000000    758700000  -1710600000
   -191000000    -93000000  -3283000000     39289000      2000000
    548000000    -47000000   -119000000     76000000    326500000
   -171460000   -590000000     39000000    150000000    781000000
    -20440000     -8796000    -41000000     -2000000    115100000
    108369000    -22000000  -1763000000    228529000    289000000
   -116000000  -1610000000    375200000   2288000000    -59000000
  -2214800000    848000000     23000000   -325000000     22239000
     46300000   -319396000    683000000  -1179000000    533875000
    830000000  -4496000000   -116800000    249000000    -35000000
     29000000     79000000  -1368707000   1617921000      2196000
    523803000    -14756000      4073000    450000000   -513000000
    -71065000     64600000   4624000000   -119311000    273599000
     28136000   3515000000     14523000    592000000     46600000
   -240000000     46000000    194800000    -19000000    -58589000
    -43239000    -83906000    -30900000    -26905000   -355228000
    254000000  -1603000000    -15576000   2824000000   -809000000
  -3857000000     73901000   -363198000   -685000000     63492000
   7786000000     83583000   -373409000    175000000   -112818000
    162690000    272000000     49000000   -184471000  -1504000000
   7523000000   2300000000     13065000    -17388000   -445000000
    -28325000    636000000   -790000000   -193542000   -296585000
  12747000000   -831000000   -157700000      2448000    114300000
   -900000000    412000000      -395000   -271788000   -218700000
  -7341000000      2212000   -170000000    -86000000  -1649000000
   -211400000    -79600000   -123369000    136400000    -27208000
   -235000000   -205200000   -356000000    -51100000   -638127000
    301000000    274000000   -107000000    610000000     10906000
      1603000     -8000000     85000000    -78836000   5607600000
    537900000    239000000   1944000000    -16185000    403700000
     35300000     59758000   -584000000    -99000000   1805094000
   -952000000  -1098300000   -367000000   1083000000  -1177000000
     -5317000     13624000   -910125000   1010500000    151000000
   -155000000   -126000000     -6000000    379000000    695722000
     -9000000     84000000  -1456000000    128000000   3394000000
    915325000     75400000     36442000    -75150000    217100000
   -134259000   -588000000   -298400000   -403561000    278800000
    -28000000  -1671386000   2962000000    298000000    700900000
    160383000    116000000   -533785000   1735000000   -295000000
     15900000     31884000    625000000   -563000000   2694000000
  -2133000000    -61744000     21000000   -808000000     10853000
    -67676000    -62542000    168081000        23000    -42800000
    615000000     73000000    165012000    -26010000   -654720000
    694000000     88852000  -1016000000     33398000   -167000000
  -2630000000   -648000000    237800000    -38000000    497000000
   1584000000  -3482000000   -150300000      5000000     10716000
     -4636000   -891400000    159000000      6000000     12679000
    250000000    -58000000    100145000   -199000000   -463323000
   1004000000     -8482000     29159000   3428000000     10400000
   -195000000    439000000   1630000000     -3800000    425000000
    142787000    637230000     98989000     37051000     89509000
     -1803000  -6128000000     65488000    -12100000   -460000000
   -254000000  -1268000000   -140000000   -467300000   -568000000
    -12000000   -102075000    373520000      5332000    734422000
   -911000000    133000000    -43000000     17000000  -1032187000
    376000000    -43623000]
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique values in Net Income [  7610000000   5144000000   4423000000    629551000    696878000
   1849000000    596541000    636000000   2052300000   2533000000
   2196000000    248710000    141555000    356800000    321406000
    334906000    848000000   2171000000    153900000    144000000
   1377000000    590859000    516000000   6939000000   1562000000
    685074000    596000000    442600000   2560000000   1385000000
 -23528000000  -6692000000    763500000   -322000000    892000000
    741733000   1364000000    476000000   5163000000   5176000000
  15888000000    968000000   2084000000    135400000  -1967000000
   3547000000   3158000000    280900000   1565000000   -239000000
    609700000    583106000  17242000000   2512000000   2834000000
    547132000   1520992000   1757000000   1602000000    664900000
    840000000    410400000 -14685000000    509699000   -271000000
   2094000000    634000000   1384000000    521000000   1247000000
    475602000   1399000000    537000000    355000000   -692000000
   4050000000   -113891000    203523000   1968000000    878000000
   1623600000    319361000   5237000000   4587000000     65900000
   1899000000   4526000000   1953000000   1940000000   2297000000
    709000000   3357400000   8382000000   1034000000   1450000000
    296689000    168800000    869829000    764000000   2816000000
    269732000 -14454000000   1725000000   1002100000   1193000000
    429100000   1117000000  -4524515000    187774000    870120000
     85171000    878485000    232120000    268000000   1979000000
   -156734000    494900000   2269000000    457223000    764465000
    394950000   7373000000    516361000   3669000000    315000000
 -12156000000    578000000    650800000    712000000    241686000
    412512000    267669000    489000000    210219000    546421000
   -196000000   2965000000   1374561000  18108000000   1339000000
   9687000000    705672000    456227000    307000000    768996000
   -671000000    451838000    692957000   2129000000    849073000
   -559235000  -3056000000   1682000000    752207000   4768000000
   2461000000   4554000000    686088000    479058000    558000000
    512951000   1276000000  13190000000    192078000    419247000
  11420000000    938000000    454600000    123241000    588800000
    968100000    427235000    302971000    633700000  24442000000
    894115000   1013000000    253000000   7351000000    483500000
    329200000    802894000    436900000    423223000   -240000000
   2408400000   3605000000    388400000    252111000   2181000000
   3433000000   4476000000   3808000000    350745000    487562000
    859000000    369416000   4529300000    941300000   7267000000
   5310000000    615302000    653500000    401600000    288792000
   1599000000   4833000000    546733000   5241000000   1000400000
   2852000000   4442000000  -2204000000   1079667000    352820000
  -2270833000    847600000    997000000  -2441000000    428000000
   2752000000    220000000    122641000  -3362000000    570000000
   -769000000   1556000000    973800000    357659000    350000000
    283766000    244977000   1093900000    931216000  -7829000000
    260100000    407943000   1604000000    888000000   2551360000
   1679000000   5452000000   6960000000   1234000000    636056000
   1267600000    494090000   6873000000   4106000000    -76400000
    437257000   1406000000    682000000   5642000000   4227000000
    321824000   1547000000   1228000000    304768000    665783000
    357796000    696067000   -713685000    749900000    746000000
   1447000000    196000000    225400000   1053849000    284084000
    606828000   2421000000   2139375000   1156000000    267046000
   1350000000   1933000000   1980000000    798300000  -4556000000
   2214000000   1439000000  13345000000    359500000   -214000000
    459522000    527100000   1975400000    198000000   3439000000
    410395000   1220000000   1540000000    369041000   2986000000
    232573000   7340000000    340383000    680528000   5813000000
    867100000   4772000000   4844000000   7608000000    411500000
   3990000000    221177000    760434000    507577000    375236000
   -556334000    419222000  17879000000    469053000    640300000
  22894000000    783000000    753000000   -571000000    837800000
    506000000    612000000    195290000  -2408948000    984485000
   1201560000  16150000000    251200000    474000000    340000000
  -4359082000   1293000000    147000000    309471000    339000000]
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique values in Earnings Per Share [ 11.39    3.15    2.94    1.26    0.31    2.99    8.91    2.6     3.13
   5.88    1.69    1.52    2.08    2.07    1.8     3.01    6.61    5.12
   1.6     0.68    1.13    2.46    9.49    9.15    8.6     1.42    1.28
   3.93    9.73    4.93  -61.2   -13.18    2.47   -0.31    1.21    5.54
   2.86    2.66    3.9     7.52    4.18    1.78    2.59   -4.49   15.38
   2.73    2.05    0.94   -0.18    2.72    3.79    5.41    3.54    8.71
   1.64    4.44    2.26    2.02    2.97    1.55  -22.43    3.52   -2.43
   8.17    3.87    1.53    2.93    3.71   15.3     7.86    1.9    -1.61
   7.15   -0.28    1.79    2.      1.58    2.67    2.01    4.66    0.54
   3.21    5.68    2.17    4.03    5.14    4.92    4.81    4.95    5.08
   1.56    4.68    5.52    4.      4.05    1.27  -35.55    1.43    3.38
   4.07    3.61    5.71   -8.29    3.25    2.37    0.56    2.77    3.5
   0.92    4.25   -0.99    2.3     2.685   2.42    5.87    1.86    1.77
   1.31    1.97  -11.31    1.37    2.22    3.04    1.73    2.85    3.66
   5.42   -0.29    9.23   12.37    1.02    6.11    4.65    2.39    1.14
  11.69   -0.79    0.82    2.35   -1.21  -10.78    3.28    5.78    0.22
   8.54   13.48    5.19    2.41    2.25    1.11    0.58   15.87    5.16
   3.69    1.62    6.05    2.78    0.1     4.41    2.31    5.03    3.03
  -2.97    2.27   11.62    3.36    0.74    3.3     9.71    9.62    3.08
   3.22    1.03    1.08    4.82    4.7     4.49    4.61    3.14    4.31
   7.72    2.79    5.29   -3.26    7.22   12.75  -13.03   -6.07    2.56
   0.43    0.29  -21.18   -1.99    5.13    1.3     1.09    1.17    4.43
   9.32  -10.23    0.86    2.04    4.52    1.81   50.09    3.32    4.11
   2.16    1.38    4.42   -0.42    3.94    5.18    1.01    7.78    1.59
   5.39    1.      5.75    6.17    6.92   -4.29    2.14    5.22    1.04
   1.63   11.38    4.26    3.02    5.43    3.62    4.53    4.21    3.82
   1.94   -1.53    4.96   10.99   12.5     1.98   19.52    6.89    6.1
   3.51    5.51    5.38    8.72    4.13    8.      1.66    3.07    3.29
  -2.31    4.38    5.7     2.36    9.95   -0.76    0.89    1.93  -25.92
   4.22    3.85    0.42    1.88   -4.64    0.78    1.2  ]
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique values in Estimated Shares Outstanding [6.68129938e+08 1.63301587e+09 1.50442177e+09 4.99643651e+08
 2.24799355e+09 6.18394649e+08 6.69518519e+07 2.44615385e+08
 4.21897810e+08 4.30782313e+08 1.29940828e+09 1.63625000e+08
 6.80552885e+07 1.72367150e+08 1.78558889e+08 1.11264452e+08
 1.28290469e+08 4.24023438e+08 9.61875000e+07 2.11764706e+08
 1.21858407e+09 2.40186585e+08 5.43730242e+07 7.58360656e+08
 1.81627907e+08 4.82446479e+08 4.65625000e+08 1.12620865e+08
 2.63103803e+08 2.80933063e+08 3.84444444e+08 5.07738998e+08
 3.09109312e+08 1.03870968e+09 7.37190083e+08 1.33886823e+08
 1.04405594e+09 1.78947368e+08 5.06660363e+08 6.88297872e+08
 8.45069512e+08 5.43820225e+08 8.04633205e+08 7.52222222e+07
 4.38084632e+08 2.30624187e+08 1.15677656e+09 1.37024390e+08
 1.66489362e+09 1.32777778e+09 2.24154412e+08 1.53853826e+08
 3.18706100e+09 7.09604520e+08 3.25373134e+08 3.33617073e+08
 3.42565766e+08 7.77433628e+08 7.93069307e+08 2.23872054e+08
 5.41935484e+08 1.31118211e+08 6.54703522e+08 1.44800852e+08
 1.11522634e+08 2.56303550e+08 1.63824289e+08 9.04575163e+08
 1.77815700e+08 3.36118598e+08 3.10850980e+07 1.77989822e+08
 2.82631579e+08 1.18729097e+08 4.29813665e+08 5.66433566e+08
 4.06753571e+08 1.40335196e+08 9.84000000e+08 5.55696202e+08
 6.08089888e+08 1.58886070e+08 1.12381974e+09 1.86463415e+09
 1.22037037e+08 5.91588785e+08 7.96830986e+08 9.00000000e+08
 5.32235888e+08 4.46887160e+08 1.44105691e+08 6.98004158e+08
 1.69333333e+09 2.85433071e+08 1.90185256e+08 3.60683761e+07
 1.57577717e+08 1.91000000e+08 6.95308642e+08 2.12387402e+08
 4.06582278e+08 1.20629371e+09 2.96479290e+08 2.93120393e+08
 1.18864266e+08 3.56869010e+08 1.48511384e+08 5.45779855e+08
 5.77766154e+07 3.67139240e+08 1.52091071e+08 3.17142599e+08
 6.63200000e+07 2.91304348e+08 4.65647059e+08 1.58317172e+08
 2.15173913e+08 2.99887089e+08 1.88935124e+08 1.30232538e+08
 2.49968354e+08 3.96397850e+09 2.91729378e+08 2.80076336e+09
 1.59898477e+08 1.07480106e+09 2.93153153e+08 2.34210526e+08
 1.39702890e+08 1.44741053e+08 1.33168657e+08 1.33606557e+08
 6.91509868e+07 1.00815683e+08 6.75862069e+08 3.21235103e+08
 1.58299351e+08 1.46386419e+09 1.31274510e+09 1.58543372e+09
 1.51757419e+08 1.90889958e+08 2.69298246e+08 6.57823781e+07
 8.49367089e+08 1.25162881e+08 4.14202335e+08 3.61307660e+08
 4.62177686e+08 2.83487941e+08 4.15308642e+08 2.02751213e+08
 7.80360066e+08 5.13987730e+08 1.99237805e+08 8.28820069e+07
 2.53636364e+09 1.49414520e+08 9.78486647e+08 9.27913043e+07
 8.07797688e+07 4.73858921e+09 4.16888889e+08 4.09549550e+08
 2.12484483e+08 3.71014493e+07 3.68023256e+08 4.28362832e+08
 1.15781843e+08 1.25194628e+08 3.91172840e+08 4.04000000e+09
 4.44833333e+08 3.64388489e+08 2.53000000e+09 4.34970414e+09
 1.09637188e+08 1.42510822e+08 2.07466150e+08 1.78246546e+08
 2.13598256e+08 8.08080808e+07 1.06096916e+09 3.10240964e+08
 1.15595238e+08 3.40690540e+08 6.60909091e+08 3.53553038e+08
 4.65280665e+08 1.13333333e+09 7.95340136e+07 2.66770186e+08
 3.44660194e+08 3.42051852e+08 9.39688797e+08 2.00276596e+08
 1.61848552e+09 1.15184382e+09 2.37568340e+08 1.27898089e+08
 6.70051044e+07 5.31229236e+08 6.26036269e+08 1.46954178e+09
 1.96292135e+09 3.58566308e+08 5.39130435e+08 2.81139240e+09
 6.76073620e+08 1.49538366e+08 2.76721569e+07 1.74277283e+08
 3.74812030e+08 4.02141680e+08 1.67187500e+08 4.50409165e+08
 5.11627907e+08 4.22900000e+08 1.58734655e+08 3.67741936e+08
 3.86432161e+08 3.03313840e+08 2.41637717e+08 3.22215315e+08
 2.69230769e+08 2.60335780e+08 2.09382051e+08 2.46930023e+08
 9.99158798e+07 7.65298143e+08 3.02441860e+08 1.99972059e+08
 3.54867257e+08 4.90607735e+08 5.09355161e+07 5.05722892e+08
 6.15929204e+09 3.00243309e+08 4.91391569e+08 5.86851852e+08
 3.58036232e+08 1.55497738e+09 5.46010638e+08 1.81904762e+08
 1.10978934e+08 2.71428571e+08 6.75247525e+08 4.56103476e+08
 5.43316195e+08 2.02405031e+08 2.87012987e+08 1.22800000e+09
 5.30031304e+07 2.19730363e+08 1.03088493e+08 1.31542647e+08
 1.00587717e+08 1.66360140e+08 3.50420561e+08 1.42911877e+08
 1.39134615e+09 1.38282209e+08 9.26053603e+07 2.78513726e+08
 1.29664103e+08 9.31153846e+08 3.63839286e+08 2.71361502e+08
 8.84258278e+07 2.48618784e+08 5.33977901e+08 4.37086093e+08
 1.89619952e+08 8.32330827e+08 3.76701571e+08 5.63080169e+09
 1.85309278e+08 1.39869281e+08 2.25255882e+08 1.25201900e+08
 3.98266129e+08 1.43478261e+08 3.12920837e+08 1.35443894e+08
 1.23200000e+08 1.86384343e+08 3.76024590e+08 2.61833077e+08
 9.87703919e+07 9.52950820e+08 2.47037037e+08 8.66061706e+08
 9.00371747e+08 8.72477064e+08 9.96368039e+07 4.98750000e+08
 1.33239157e+08 2.10646537e+08 1.65334528e+08 1.14053495e+08
 2.40837229e+08 3.32715873e+08 4.08196347e+09 8.22900000e+07
 2.71313559e+08 5.47703349e+09 7.86934673e+07 4.53614458e+08
 7.51315790e+08 5.68539326e+08 1.18146718e+08 1.01186528e+08
 9.29378086e+07 5.07466495e+08 2.84729858e+08 4.19480520e+09
 1.12857143e+09 1.80851064e+08 9.39457328e+08 4.35353535e+08
 1.88461538e+08 2.57892500e+08 4.98529412e+08]
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique values in P/E Ratio [  3.71817366  18.80634984  15.2755102   74.55555714 178.4516129
  12.26755853  31.04040483  16.62692308  18.45654341  10.18707517
  36.66863964  26.33552566  38.72115433  19.77777729  29.23888944
  18.60797276  12.18003056  12.12695313  41.19999875 280.5147059
  16.52212389  21.78455285  16.83456217  17.74098383  12.37441837
  68.27464577 528.0390742   15.18066158  14.33093546  18.70385375
  93.0892875   21.14574899  18.68760706  31.99173471  33.236463
  18.24940665  22.46240602  10.26350566  19.22739309  13.00478493
  21.43258539  14.59845598 105.2444456   19.91872601  15.09890147
  35.47805024  73.18085213  31.46896164  15.89338235  33.6517153
   9.56561922  19.19773983  13.41561401  21.08536707  19.47072005
  24.10619469  59.2871297   13.74074108  16.89677484  13.55910495
  28.40792888  17.61931818  20.81987609  17.91064896  15.28940517
  43.54248562  14.27645119  24.42048464  31.36274549  11.19720127
  18.98947474  22.01003278  17.31307587  10.0951049   33.99441229
  12.9750005   15.92405063  22.47940075  37.63681692  20.98068605
  36.56910528 171.9629648   21.07165078   8.9242956   30.69124332
  14.84223297  10.43190642  14.45934939  14.63970579  21.22828323
  16.87598484  48.4743609   22.20726496  11.10688424  23.29999925
  17.62716025  54.88976299  19.21678322  33.8402358   15.79115405
  30.85041634  18.91693259  11.82311769  93.046152    34.42615865
  18.4368231   68.40285829  32.21739022  12.24470612  34.33913174
  18.63636281  21.17546899  55.82911329   7.57526882  23.06214689
  79.89313282  28.17258883  22.81195132  23.16058394  27.2972964
  30.08552599  16.22543353  16.5684214   20.93532438  10.6912571
  48.0592125   12.17527638  14.51898734  14.88190693  47.36697339
   8.18027502  17.92156961   5.56628445  18.47096753  15.55230042
  28.65789298  17.33019641  18.65928006  13.48780488  13.15758696
  28.94893574  10.73086395  12.23450108  16.95090016  25.30952381
  24.07012104  27.36851246  69.72727273  20.90280972  10.20919844
  35.22705217  23.05202293  14.29460622  16.755556    20.97297387
  46.56896552  34.41461708  17.96124031  14.81415929  19.88075908
  17.33471116  17.03703704  10.91404942  13.16417861  45.79136799
 149.2         25.42011775  16.93197234  18.19047619  12.63824289
  15.90093725  37.11894361   9.29315491  23.49999865  13.04848515
   5.59835232   9.03326424  28.97619077  20.59183628  26.1980526
  27.47572718  25.15740741  24.51037324  21.34893532   9.98663697
  10.45770043  73.12355174  27.24840701  31.68909559  18.42192724
  19.51295324  21.80149775   9.88888889   9.79962193  33.43037975
  16.78393352  26.59843176   4.30451128  22.72265547  17.00327316
  41.8372093  394.4137828   30.06451484  16.48927797  17.88833648
  36.30630541  33.90769385  21.07692308  17.07900767  27.19098691
  18.77906977  10.12254902  10.48672611  29.38673978  25.45318329
  11.65361416  26.93261402  28.5663708   10.94403893  14.72222176
  12.91304348  19.88914118  12.67420186  14.57922079  16.36548299
  19.07722008  33.7920802    6.58124527  10.5141392   12.73584906
  18.99814508  36.200001     9.88347861  33.40263993  87.98541248
  17.33088199  27.42629957  20.5560757   11.58812299  31.66346154
  82.55172759  27.36196196 110.7647088   11.79700833  17.99615423
  33.06802755  23.14084554  39.93377417  11.83425414  14.64900684
  11.43233083  24.32984346  48.4123701   12.50980392  13.57719715
  28.59879153  61.77536232  10.26933585  28.21782178   8.42960024
  25.15151465  19.16433601   2.93545077  28.9         17.34252511
  19.28524574   9.48433077  14.19237695  17.88661766  11.01720183
  19.56416538   8.83874987  57.21084398  27.68975042  25.0423443
  26.55319179  39.60292786  44.78571429  10.55251164  23.61052667
  21.74152585  14.76080352  32.15060181  10.98773006  33.68539326
  14.02509691  35.84974197  18.51030928   9.28436019  20.24675247
  19.41489362  17.68221394 131.5256359   22.74999917  70.47058529]
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique values in P/B Ratio [-8.78421945e+00 -8.75006804e+00 -3.94171377e-01  4.19965109e+00
  1.05980998e+00  7.49683072e+00  1.29064585e+02 -7.19496855e-01
 -3.02264878e+00 -1.88391201e+00 -4.32713829e+00 -1.26933216e+00
 -4.07261517e+00 -9.85570628e+00  4.28235752e+00 -1.36497235e+01
 -1.11465802e+00 -8.77452891e-01 -1.41713889e+01  3.85775599e+00
 -4.49034237e+00 -3.10153798e+01  2.40123217e+01 -1.33983799e+01
 -2.08135771e+01  3.90442953e+00 -7.97010393e+00 -3.10067734e+01
 -7.75985560e+00  4.97080925e+00 -1.28609384e+01  8.20292338e+00
  2.63981367e+00  2.90291480e-01 -3.08947652e+00  3.95497461e+00
 -4.89529412e+00 -6.09073508e-01  2.20326121e+01 -9.38006774e-01
  8.63704546e+00 -8.52562380e-01 -4.12776957e+00  1.34905440e+01
  1.62602199e-01 -3.32129829e+00 -3.89565682e+00  5.88025559e-01
 -3.88092050e+00  1.04481548e+00 -1.16753335e+00  6.26405255e+00
 -1.74661009e+01 -3.41530183e+00 -1.06666788e+01 -7.47716562e+00
 -4.32005119e+00 -3.93528350e-01 -6.30960570e-02 -9.42813353e+00
 -1.84052775e+00  1.11780419e+00 -7.61190775e+01 -8.80528127e+00
 -5.48323699e-01 -5.86495365e+01  1.72013290e+01  2.14394282e+01
  6.36871510e-02 -1.30549296e+00  5.16502890e-01 -7.25643348e-01
 -2.23147395e-01  8.55095541e-01  9.02439024e-01 -1.33832119e+01
  7.12164449e+00 -1.76501314e+00 -7.01980905e+00  4.76393721e+00
  5.67399059e+00 -7.60494471e+00 -1.67300221e+01  3.25222222e+00
  6.27728687e+00 -3.75933827e-01 -4.55221439e+00 -1.37592304e+01
 -3.98503937e+00 -6.62151724e-01 -7.48931346e+00 -1.18774408e+01
 -2.29343975e+00 -1.27172775e+01 -4.42681108e+00  1.96252695e+00
  1.78561644e+00  4.60169855e+00 -1.49288674e+01 -8.11682125e+00
 -6.36928380e+00 -1.23088208e+01  1.41624318e+00  2.38567280e+01
  9.56795153e+00 -1.16983338e+00 -5.97313433e-01 -8.63959070e+00
  6.17402389e+00  6.34974742e+00 -1.71588003e+00  5.99145874e+00
 -4.41034942e+01 -1.41514453e+01  5.10875627e+00  4.42742519e+00
  5.88446716e+00 -2.10070794e+00  2.93542695e+00 -6.07256055e+00
 -1.90866103e+01 -7.97573034e+00  4.01471293e+00  1.49926228e+01
  6.74676025e+00  5.10154601e+00 -3.97339544e+00  2.25637911e+01
  1.04977041e+01  4.24299831e+00  3.15944610e+00  3.61761016e+00
 -4.89203675e+00  7.20524245e+00  5.76005679e+00  3.83589577e+00
  1.21128792e+01  1.73458569e+01  1.21453261e+01 -6.50573700e-02
 -7.27905120e+00 -3.73804696e+00  6.26481675e+00  6.06938909e+00
 -3.70982592e+00 -1.98048307e+00  5.92567697e+00 -1.13548387e-01
  4.85239120e+00 -9.81083310e-01  5.04769952e+00  6.12393390e+00
  2.65657721e-01 -2.76365122e+00  4.26075000e+01  7.58647709e+00
  4.21861998e+00  2.82384519e+00  6.29494262e+00  2.75223607e+00
 -1.88688119e+00 -2.53301086e+00 -1.89407115e+00 -1.46630663e+00
 -2.01209100e+00  2.93100547e+00 -3.07831974e-01 -1.29484372e+00
 -8.57290222e-01  1.42807500e+01 -6.51102807e-01 -1.08528544e+01
 -4.60659114e+00  1.98214162e+01 -5.11719395e+00 -2.24577338e+00
  1.03163539e+01  3.45176471e+00 -1.23701979e+01  2.21957746e+00
  4.53525099e+00  7.12214514e+00  8.61558483e+00 -1.28095060e+01
 -3.98031573e+00  6.49575517e+00  3.05088697e+00 -1.95019387e+00
  2.02384440e+00 -5.19073368e+00 -6.63297081e+00  5.84661735e+00
  5.79822581e+00 -4.28293111e+00  1.27352995e+00  4.40399354e+00
 -1.29800623e+00 -4.21330891e+00 -1.88094283e+00  1.17122901e+00
 -1.17173832e+01 -7.35331395e+00  6.97186364e+00 -5.70016789e+00
 -1.38596074e-01 -1.23755263e+01  9.58253576e+00  9.26433162e-01
  1.11681066e+01 -2.07554286e+00 -8.02511003e+00 -1.04640982e+01
 -3.64026219e-01  3.34510155e+00 -4.26858900e-01 -7.33207433e-01
  6.29052120e+00 -1.12105856e+00 -1.05242872e+00 -3.61858249e-01
 -4.52699514e+00 -2.25674696e+00 -8.43313348e-01 -1.41802706e+00
 -6.94125670e-01 -6.57486911e+00 -6.08922771e+00 -5.93157895e-01
 -2.82711144e+00 -4.17892734e+00  7.02905607e+00  4.29189430e+00
  5.74886878e-01  5.43403909e+00 -1.20208938e+01 -1.57274805e+01
  2.04089995e+01  4.08947221e+00 -1.62154690e+01  5.25089724e-01
 -2.42822510e+00 -4.01646113e+00 -1.30089841e-01 -2.58040816e+00
 -2.71690772e+00  2.82536561e+00 -7.96157903e+00 -2.79545642e+00
 -1.88641943e+01 -8.54722222e+00 -2.48137610e+00 -4.04496970e+00
  7.41377678e+00 -8.42213189e-01  7.02678249e+00 -2.35373233e+01
 -2.53851293e+01  4.06808411e+00 -1.27265533e+01 -2.80325119e+01
  2.62757576e+00 -8.91599302e-01  6.01095386e+00  4.59415584e+00
 -2.34739137e+00  2.76805090e+00 -2.31952916e+01  6.25590309e+00
  1.06689857e+00  1.06955822e+00 -1.31980547e+01  9.47139976e+00
  1.52621554e+01 -2.66190517e-01 -1.08191192e+00 -1.36174399e+01
  4.07654319e+00  2.55967070e+00 -4.04075101e+00 -2.63806868e+01
  2.95471503e+01 -1.85099485e+00 -4.50863346e+01 -1.41529880e+00
 -1.45611208e+01 -8.04377178e+00  2.28480237e+00 -1.02499673e+01
  1.26957118e+01  7.18612812e+00 -2.26192667e+00 -7.76267729e+00
 -2.70644272e+00 -2.95949367e-01  4.13047059e+00  6.26177457e+00
 -3.83825986e+00 -2.38844490e+01  1.72306785e+00]
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

No issues when checking unique values.

Exploratory Data Analysis (EDA)¶

  • EDA is an important part of any project involving data.
  • It is important to investigate and understand the data better before building a model with it.
  • A few questions have been mentioned below which will help you approach the analysis in the right manner and generate insights from the data.
  • A thorough analysis of the data, in addition to the questions mentioned below, should be done.
In [10]:
#Plotting boxplot and histogram for numeric data

def histogram_boxplot(data, feature, figsize=(12, 7), kde=True, bins=None):
    """
    data: df
    feature: numeric column
   
    """
    f2, (ax_box2, ax_hist2) = plt.subplots(
        nrows=2,  
        sharex=True,  # same x-axis
        gridspec_kw={"height_ratios": (0.25, 0.75)},
        figsize=figsize,
    )  

    sns.boxplot(
        data=data, x=feature, ax=ax_box2, showmeans=True, color="lightgreen"
    )  
    
    sns.histplot(
        data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins, palette="winter"
    ) if bins else sns.histplot(
        data=data, x=feature, kde=kde, ax=ax_hist2
    )  # For histogram
    ax_hist2.axvline(
        data[feature].mean(), color="green", linestyle="--"
    )  # Add mean to the histogram
    ax_hist2.axvline(
        data[feature].median(), color="black", linestyle="-"
    )  # Add median to the histogram
In [11]:
#Barplot for categorical columns

def labeled_barplot(data, feature, perc=False, n=None):
    """
    data: dataframe
    feature: dataframe column
    
    """

    total = len(data[feature])  # length of the column
    count = data[feature].nunique()
    if n is None:
        plt.figure(figsize=(count + 1, 5))
    else:
        plt.figure(figsize=(n + 1, 5))

    plt.xticks(rotation=90, fontsize=15)
    ax = sns.countplot(
        data=data,
        x=feature,
        palette="Paired",
        order=data[feature].value_counts().index[:n].sort_values(),
    )

    for p in ax.patches:
        if perc:
            label = "{:.1f}%".format(
                100 * p.get_height() / total
            )  # percentage of each class of the category
        else:
            label = p.get_height()  # count of each level of the category

        x = p.get_x() + p.get_width() / 2  # width of the plot
        y = p.get_height()  # height of the plot

        # annotate the percentage
        ax.annotate(
            label,
            (x, y),
            ha="center",
            va="center",
            size=12,
            xytext=(0, 5),
            textcoords="offset points",
        )  

    plt.show()  

Ticker Symbol¶

In [12]:
df['Ticker Symbol'].value_counts()
Out[12]:
AAL     1
NEE     1
NUE     1
NTRS    1
NSC     1
       ..
EQR     1
EQIX    1
EOG     1
EMN     1
ZTS     1
Name: Ticker Symbol, Length: 340, dtype: int64

All values are distinct.

Security¶

In [13]:
df['Security'].value_counts()
Out[13]:
American Airlines Group    1
NextEra Energy             1
Nucor Corp.                1
Northern Trust Corp.       1
Norfolk Southern Corp.     1
                          ..
Equity Residential         1
Equinix                    1
EOG Resources              1
Eastman Chemical           1
Zoetis                     1
Name: Security, Length: 340, dtype: int64

All values are distinct.

GICS Sector¶

In [14]:
df['GICS Sector'].value_counts()
Out[14]:
Industrials                    53
Financials                     49
Health Care                    40
Consumer Discretionary         40
Information Technology         33
Energy                         30
Real Estate                    27
Utilities                      24
Materials                      20
Consumer Staples               19
Telecommunications Services     5
Name: GICS Sector, dtype: int64
In [15]:
labeled_barplot(df, 'GICS Sector')

The labels are assigned for specific economic sector assigned to a company. The majority are Industrials, Financial, Healthcare and Consumer Discretionary. Telecommunications Services represent the minority in the data set.

GICS Sub Industry¶

In [16]:
df['GICS Sub Industry'].value_counts()
Out[16]:
Oil & Gas Exploration & Production            16
REITs                                         14
Industrial Conglomerates                      14
Electric Utilities                            12
Internet Software & Services                  12
                                              ..
Technology Hardware, Storage & Peripherals     1
Real Estate Services                           1
Trucking                                       1
Networking Equipment                           1
Casinos & Gaming                               1
Name: GICS Sub Industry, Length: 104, dtype: int64
In [17]:
labeled_barplot(df, 'GICS Sub Industry')

Labels assigned for the specific sub-industry group. There is a variety of sub-industries with the highest values belonging to Oil and Gas Exploration and Production, REITs and Industrial Conglomerates.

Current Price¶

In [18]:
histogram_boxplot(df, 'Current Price')

The distribution for current price is right skewed with the data centered around 59.71. There are outliers present on the right side.

In [19]:
df['Current Price'].median()
Out[19]:
59.705
In [20]:
df['Current Price'].max()
Out[20]:
1274.949951

The highest value for current price is approximately 1274.

Price Change¶

In [21]:
histogram_boxplot(df, 'Price Change')

The distribution for price change is slightly left skewed, but still resembles a normal distribution with outliers present on both sides.

Volatility¶

In [22]:
histogram_boxplot(df, 'Volatility')

The volatility represents the standard deviation of the stock price over the last 13 weeks. The distribution for volatility is right skewed with outliers. The data is centered around 1.4.

ROE¶

In [23]:
histogram_boxplot(df, 'ROE')

ROE is a measure of financial performance. The distribution for ROE is right skewed with outliers present. The data is centered around 15.

Cash Ratio¶

In [24]:
histogram_boxplot(df, 'Cash Ratio')

Cash Ratio represent a company's cash to liabilities. The distribution is right skewed with outliers present. The data is centered around 47.

Net Cash Flow¶

In [25]:
histogram_boxplot(df, 'Net Cash Flow')
In [26]:
df['Net Cash Flow'].median()
Out[26]:
2098000.0

Net Cash Flow describes difference from cash in flow and outflow in dollars. The distribution represents a normal distribution with outliers present on both sides. The data is centered around 2098000.

Net Income¶

In [27]:
histogram_boxplot(df, 'Net Income')
In [28]:
df['Net Income'].median()
Out[28]:
707336000.0

Net income represents revenue after expenses. interest and taxes in dollars. The distribution is right skewed with outliers present on both sides. The data is centered around 707336000.

Earnings Per Share¶

In [29]:
histogram_boxplot(df, 'Earnings Per Share')
In [30]:
df['Earnings Per Share'].median()
Out[30]:
2.895

Earnings per share represents the company's net income divided by the total common shares in dollars. The distribution for Earnings per share represents a normal distribution with outliers present on both sides. The data is centered around 2.9.

Estimated Shares Outstanding¶

In [31]:
histogram_boxplot(df, 'Estimated Shares Outstanding')
In [32]:
df['Estimated Shares Outstanding'].median()
Out[32]:
309675137.79999995

The distribution for estimated shares outstanding is heavily right skewed with outliers present. The data is centered around 309675137.

P/E Ratio¶

In [33]:
histogram_boxplot(df, 'P/E Ratio')
In [34]:
df['P/E Ratio'].median()
Out[34]:
20.81987609

P/E ratio represnts companys stock price to earnings per share. The distribution for P/E ratio is right skewed with outliers present. The data is centered around 20.8.

P/B Ratio¶

In [35]:
histogram_boxplot(df, 'P/B Ratio')
In [36]:
df['P/B Ratio'].median()
Out[36]:
-1.0671703205

P/B ratio represents companys stock price per book value. The distribution for P/B ratio represents a normal distribution with outliers present on both sides. The data is centered around -1.1.

Bivariate Analysis¶

In [37]:
num_cols = df.select_dtypes(include=np.number).columns.tolist()
In [38]:
plt.figure(figsize=(10,5))
sns.heatmap(df[num_cols].corr(), annot=True)
Out[38]:
<Axes: >
In [39]:
sns.pairplot(data=df[num_cols], diag_kind="kde")
plt.show()

Volatility seems to be negatively correlated with price change. Earnings per share seems to be correlated with current price and net income.

Questions¶

Questions:

  1. What does the distribution of stock prices look like?
  2. The stocks of which economic sector have seen the maximum price increase on average?
  3. How are the different variables correlated with each other?
  4. Cash ratio provides a measure of a company's ability to cover its short-term obligations using only cash and cash equivalents. How does the average cash ratio vary across economic sectors?
  5. P/E ratios can help determine the relative value of a company's shares as they signify the amount of money an investor is willing to invest in a single share of a company per dollar of its earnings. How does the P/E ratio vary, on average, across economic sectors?

Q1¶

In [40]:
histogram_boxplot(df, 'Current Price')

The distribution for Current price (of Stock in dollars) is right skewed with outliers present on the right. The data is centered around 59.71 with the max value being approximately 1274.95.

Q2¶

In [41]:
q2 = df.groupby('GICS Sector')['Price Change'].mean()
q2
Out[41]:
GICS Sector
Consumer Discretionary          5.846093
Consumer Staples                8.684750
Energy                        -10.228289
Financials                      3.865406
Health Care                     9.585652
Industrials                     2.833127
Information Technology          7.217476
Materials                       5.589738
Real Estate                     6.205548
Telecommunications Services     6.956980
Utilities                       0.803657
Name: Price Change, dtype: float64
In [42]:
plt.figure(figsize= (10,5))
sns.boxplot(df, x= 'GICS Sector', y='Price Change')
plt.xticks(rotation=90);

Healthcare, Consumer Staples and IT are the top three sectors in price change on average respectively. Healthcare and IT do have a wider distribution compared to other sectors. Energy has the highest variance and is also the lowest in terms of price change.

Q3¶

In [43]:
plt.figure(figsize=(10,5))
sns.heatmap(df[num_cols].corr(), annot=True)
Out[43]:
<Axes: >

A heatmap of the numerical columns has been created to find any correlations between variables. Earnings per share is slightly correlated with Current price and Net income. Estimated shares outstanding and Net income are also slightly correlated.

Q4¶

In [44]:
q4 = df.groupby('GICS Sector')['Cash Ratio'].mean()
q4
Out[44]:
GICS Sector
Consumer Discretionary          49.575000
Consumer Staples                70.947368
Energy                          51.133333
Financials                      98.591837
Health Care                    103.775000
Industrials                     36.188679
Information Technology         149.818182
Materials                       41.700000
Real Estate                     50.111111
Telecommunications Services    117.000000
Utilities                       13.625000
Name: Cash Ratio, dtype: float64
In [45]:
plt.figure(figsize= (10,5))
sns.boxplot(df, x= 'GICS Sector', y='Cash Ratio')
plt.xticks(rotation=90);

IT, Telecommunication Service and Healthcare have the highest Cash Ratio on average respectively. IT and Healthcare, both have the highest variance compared to other sectors. The utilities sector has the lowest average cash ratio.

Q5¶

In [46]:
q5 = df.groupby('GICS Sector')['P/E Ratio'].mean()
q5
Out[46]:
GICS Sector
Consumer Discretionary         35.211613
Consumer Staples               25.521195
Energy                         72.897709
Financials                     16.023151
Health Care                    41.135272
Industrials                    18.259380
Information Technology         43.782546
Materials                      24.585352
Real Estate                    43.065585
Telecommunications Services    12.222578
Utilities                      18.719412
Name: P/E Ratio, dtype: float64
In [47]:
plt.figure(figsize= (10,5))
sns.boxplot(df, x= 'GICS Sector', y='P/E Ratio')
plt.xticks(rotation=90);

Energy, IT, Real Estate and Healthcare have the highest P/E ratio on average respectively. Energy has very high variance in P/E ratio relative to other sectors.

Data Preprocessing¶

  • Duplicate value check
  • Missing value treatment
  • Outlier check
  • Feature engineering (if needed)
  • Any other preprocessing steps (if needed)

Duplicate value Check¶

In [48]:
df.duplicated().sum()
Out[48]:
0

There are duplicate values.

Missing value treatment¶

In [49]:
df.isna().sum()
Out[49]:
Ticker Symbol                   0
Security                        0
GICS Sector                     0
GICS Sub Industry               0
Current Price                   0
Price Change                    0
Volatility                      0
ROE                             0
Cash Ratio                      0
Net Cash Flow                   0
Net Income                      0
Earnings Per Share              0
Estimated Shares Outstanding    0
P/E Ratio                       0
P/B Ratio                       0
dtype: int64

There are no missing values.

Outlier Check¶

In [50]:
#Viewing boxplots for outliers
outliers = df.select_dtypes(include=np.number).columns.tolist()

plt.figure(figsize=(15, 10))

for i, variable in enumerate(outliers):
    plt.subplot(3, 5, i + 1)
    sns.boxplot(data=df, x=variable)
    plt.tight_layout(pad=2)

plt.show()

Outliers are present in the data, but they will not be treated as they might hold valuable insight.

Feature Engineering¶

No Feature engineering required.

Preprocessing¶

In [51]:
#dropping first two columns as they are all unique values
df.drop("Ticker Symbol", axis=1, inplace= True)
df.drop("Security", axis=1, inplace= True)

df.head()
Out[51]:
GICS Sector GICS Sub Industry Current Price Price Change Volatility ROE Cash Ratio Net Cash Flow Net Income Earnings Per Share Estimated Shares Outstanding P/E Ratio P/B Ratio
0 Industrials Airlines 42.349998 9.999995 1.687151 135 51 -604000000 7610000000 11.39 6.681299e+08 3.718174 -8.784219
1 Health Care Pharmaceuticals 59.240002 8.339433 2.197887 130 77 51000000 5144000000 3.15 1.633016e+09 18.806350 -8.750068
2 Health Care Health Care Equipment 44.910000 11.301121 1.273646 21 67 938000000 4423000000 2.94 1.504422e+09 15.275510 -0.394171
3 Information Technology Application Software 93.940002 13.977195 1.357679 9 180 -240840000 629551000 1.26 4.996437e+08 74.555557 4.199651
4 Information Technology Semiconductors 55.320000 -1.827858 1.701169 14 272 315120000 696878000 0.31 2.247994e+09 178.451613 1.059810
In [52]:
#Creating a list of numerical columns 
num_cols = df.select_dtypes(include=np.number).columns.tolist()
print(num_cols)
['Current Price', 'Price Change', 'Volatility', 'ROE', 'Cash Ratio', 'Net Cash Flow', 'Net Income', 'Earnings Per Share', 'Estimated Shares Outstanding', 'P/E Ratio', 'P/B Ratio']
In [53]:
# Scaling the data set before clustering with standard scaler on numerical columns
scaler = StandardScaler()
subset = df[num_cols].copy()
subset_scaled = scaler.fit_transform(subset)
In [54]:
#Scaled data set
scaled_df = pd.DataFrame(subset_scaled, columns=subset.columns)
In [55]:
#First 5 rows of scaled data
scaled_df.head()
Out[55]:
Current Price Price Change Volatility ROE Cash Ratio Net Cash Flow Net Income Earnings Per Share Estimated Shares Outstanding P/E Ratio P/B Ratio
0 -0.393341 0.493950 0.272749 0.989601 -0.210698 -0.339355 1.554415 1.309399 0.107863 -0.652487 -0.506653
1 -0.220837 0.355439 1.137045 0.937737 0.077269 -0.002335 0.927628 0.056755 1.250274 -0.311769 -0.504205
2 -0.367195 0.602479 -0.427007 -0.192905 -0.033488 0.454058 0.744371 0.024831 1.098021 -0.391502 0.094941
3 0.133567 0.825696 -0.284802 -0.317379 1.218059 -0.152497 -0.219816 -0.230563 -0.091622 0.947148 0.424333
4 -0.260874 -0.492636 0.296470 -0.265515 2.237018 0.133564 -0.202703 -0.374982 1.978399 3.293307 0.199196

We have scaled the data to reduce bias and influence on columns with larger values. We can see in the first five rows that the data has been scaled.

In [56]:
#Creating another copy to append labels later
#Doing this so labels won't influence clustering from k-means to hc
df1=df.copy()
scaled_df1=scaled_df.copy()

EDA¶

  • It is a good idea to explore the data once again after manipulating it.

Ticker Symbol and Security(Company) have been dropped due to all values being unique.

In [57]:
#Plotting catergorical columns together at once
cat_cols = ["GICS Sector", "GICS Sub Industry"]

for feature in df[cat_cols]:
    labeled_barplot(df, feature)

GICS Sector and sub industry distribution remains the same.

In [58]:
#Using function and loop to plot all columns at once with scaled data
for feature in df[num_cols]:
    histogram_boxplot(scaled_df, feature) 

Scaled data has similar distributions, but the values are changed to mitigate bias from larger numbers within the data.

K-means Clustering¶

K-means and Elbow Plot¶

In [59]:
%%time

clusters = range(1, 16) #Picking range from 1-15 to test clustering
meanDistortions = []

for k in clusters:
    model = KMeans(n_clusters=k, random_state=0)
    model.fit(scaled_df)
    prediction = model.predict(scaled_df)
    distortion = (
        sum(
            np.min(cdist(scaled_df, model.cluster_centers_, "euclidean"), axis=1)
        )
        / scaled_df.shape[0]
    )

    meanDistortions.append(distortion)

    print("Number of Clusters:", k, "\tAverage Distortion:", distortion)

plt.plot(clusters, meanDistortions, "bx-")
plt.xlabel("Number of Cluster")
plt.ylabel("Average Distortion")
plt.title("Elbow Plot")
Number of Clusters: 1 	Average Distortion: 2.5425069919221697
Number of Clusters: 2 	Average Distortion: 2.382318498894466
Number of Clusters: 3 	Average Distortion: 2.2683105560042285
Number of Clusters: 4 	Average Distortion: 2.1745559827866363
Number of Clusters: 5 	Average Distortion: 2.1147830379797616
Number of Clusters: 6 	Average Distortion: 2.0872686595048133
Number of Clusters: 7 	Average Distortion: 2.008680132690643
Number of Clusters: 8 	Average Distortion: 1.9711152823639846
Number of Clusters: 9 	Average Distortion: 1.8905345519244967
Number of Clusters: 10 	Average Distortion: 1.8568527333547074
Number of Clusters: 11 	Average Distortion: 1.8574273605424652
Number of Clusters: 12 	Average Distortion: 1.786263657222773
Number of Clusters: 13 	Average Distortion: 1.7118438968536325
Number of Clusters: 14 	Average Distortion: 1.6825666740823932
Number of Clusters: 15 	Average Distortion: 1.6665030856205127
CPU times: user 2.05 s, sys: 27.7 ms, total: 2.08 s
Wall time: 1.55 s
Out[59]:
Text(0.5, 1.0, 'Elbow Plot')

From the elbow plot, there is no distinct point to select. Ther is a slight bend at 5, 7, 9 and 10.

Silhouette Coefficients¶

In [60]:
%%time

sil_score = []
cluster_list = list(range(2, 15))
for n_clusters in cluster_list:
    clusterer = KMeans(n_clusters=n_clusters, random_state=0)
    preds = clusterer.fit_predict((scaled_df))
    score = silhouette_score(scaled_df, preds)
    sil_score.append(score)
    print("For n_clusters = {}, silhouette score is {}".format(n_clusters, score))

plt.plot(cluster_list, sil_score)
plt.xlabel("Number of Clusters")
plt.ylabel("Silhouette Score")
For n_clusters = 2, silhouette score is 0.43969639509980457
For n_clusters = 3, silhouette score is 0.45797710447228496
For n_clusters = 4, silhouette score is 0.4577225970476733
For n_clusters = 5, silhouette score is 0.35515084792732604
For n_clusters = 6, silhouette score is 0.4315903528127779
For n_clusters = 7, silhouette score is 0.4025633625337274
For n_clusters = 8, silhouette score is 0.40485971473985305
For n_clusters = 9, silhouette score is 0.10450448075395784
For n_clusters = 10, silhouette score is 0.12002136446835195
For n_clusters = 11, silhouette score is 0.2178504146887798
For n_clusters = 12, silhouette score is 0.13060376012568126
For n_clusters = 13, silhouette score is 0.1757117204389242
For n_clusters = 14, silhouette score is 0.18300878905519907
CPU times: user 2.4 s, sys: 1 s, total: 3.4 s
Wall time: 1.8 s
Out[60]:
Text(0, 0.5, 'Silhouette Score')

From silhouette scores clusters of 3, 4 and 6 give best performance, with a drastic drop off at k=9. The silhouette score is decreasing as n increases and only gives a max score around 0.46.

Silhouette Plots¶

In [61]:
#Visualizing k with silhouette coefficients
visualizer = SilhouetteVisualizer(KMeans(10, random_state=1))
visualizer.fit(scaled_df)
visualizer.show();

With 10 clusters the large group has been split up. There are some negative scores present.

In [62]:
#Visualizing k with silhouette coefficients
visualizer = SilhouetteVisualizer(KMeans(9, random_state=1))
visualizer.fit(scaled_df)
visualizer.show();

Cluster 2 holds the majority of entries with other clusters being smaller in comparison. There are some negative scores present.

In [63]:
#Visualizing k with silhouette coefficients
visualizer = SilhouetteVisualizer(KMeans(8, random_state=1))
visualizer.fit(scaled_df)
visualizer.show();

One cluster is dominating the grouping and other clusters are small with some negative scores.

In [64]:
#Visualizing k with silhouette coefficients
visualizer = SilhouetteVisualizer(KMeans(7, random_state=1))
visualizer.fit(scaled_df)
visualizer.show();

Cluster 0 holds the majority of entries with other clusters being small in comparison. There are some negative scores present.

In [65]:
#Visualizing k with silhouette coefficients
visualizer = SilhouetteVisualizer(KMeans(6, random_state=1))
visualizer.fit(scaled_df)
visualizer.show();

Cluster 0 contains the majority of entries with other clusters being small in comparison. There are negative scores present as well.

In [66]:
#Visualizing k with silhouette coefficients
visualizer = SilhouetteVisualizer(KMeans(5, random_state=1))
visualizer.fit(scaled_df)
visualizer.show();

Cluster group1 holds the majority of the entries with other clusters being much smaller with negative scores present.

In [67]:
#Visualizing k with silhouette coefficients
visualizer = SilhouetteVisualizer(KMeans(4, random_state=1))
visualizer.fit(scaled_df)
visualizer.show();

Cluster 0 holds the majority of the entries with other clusters being smaller in comparison. Cluster 3 is showing large amounts of negative scoring.

The outliers in the data may be leading to low silhouette coefficient values and negative values. The best score seems to be about 0.4. K=7 seems to give the best balance between coefficient values and also gave a slight indication on the elbow plot.

In [68]:
# let's take 7 as number of clusters
kmeans = KMeans(n_clusters=7, random_state=0)
kmeans.fit(scaled_df)
Out[68]:
KMeans(n_clusters=7, random_state=0)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
KMeans(n_clusters=7, random_state=0)
In [69]:
# adding kmeans cluster labels to the original and scaled dataframes

df1["K_means_segments"] = kmeans.labels_
scaled_df1["K_means_segments"] = kmeans.labels_

Cluster Profiling¶

In [70]:
cluster_profile = df1.groupby("K_means_segments").mean()
In [71]:
cluster_profile["count_in_each_segments"] = (
    df1.groupby("K_means_segments")["Current Price"].count().values
)
In [72]:
cluster_profile.style.highlight_max(color="lightblue", axis=0)
Out[72]:
  Current Price Price Change Volatility ROE Cash Ratio Net Cash Flow Net Income Earnings Per Share Estimated Shares Outstanding P/E Ratio P/B Ratio count_in_each_segments
K_means_segments                        
0 48.103077 6.053507 1.163964 27.538462 77.230769 773230769.230769 14114923076.923077 3.958462 3918734987.169230 16.098039 -4.253404 13
1 73.281231 5.002456 1.373721 25.303030 51.018939 5792560.606061 1517540458.333333 3.773201 422805643.026553 23.232765 -3.313539 264
2 26.990000 -14.060688 3.296307 603.000000 57.333333 -585000000.000000 -17555666666.666668 -39.726667 481910081.666667 71.528835 1.638633 3
3 108.304002 10.737770 1.165694 566.200000 26.600000 -278760000.000000 687180000.000000 1.548000 349607057.720000 34.898915 -16.851358 5
4 632.714991 7.374164 1.541343 19.333333 158.333333 -24046333.333333 907393166.666667 16.270000 125797901.323333 123.049240 35.355736 6
5 95.281515 14.717580 1.814754 25.954545 308.909091 645568272.727273 871490181.818182 2.006364 730848546.662727 57.950455 7.992920 22
6 37.282919 -14.529500 2.820301 40.666667 47.555556 -133624777.777778 -1904442925.925926 -4.957037 503635899.112593 86.787432 1.378738 27
In [73]:
cluster_profile.style.highlight_min(color="orange", axis=0)
Out[73]:
  Current Price Price Change Volatility ROE Cash Ratio Net Cash Flow Net Income Earnings Per Share Estimated Shares Outstanding P/E Ratio P/B Ratio count_in_each_segments
K_means_segments                        
0 48.103077 6.053507 1.163964 27.538462 77.230769 773230769.230769 14114923076.923077 3.958462 3918734987.169230 16.098039 -4.253404 13
1 73.281231 5.002456 1.373721 25.303030 51.018939 5792560.606061 1517540458.333333 3.773201 422805643.026553 23.232765 -3.313539 264
2 26.990000 -14.060688 3.296307 603.000000 57.333333 -585000000.000000 -17555666666.666668 -39.726667 481910081.666667 71.528835 1.638633 3
3 108.304002 10.737770 1.165694 566.200000 26.600000 -278760000.000000 687180000.000000 1.548000 349607057.720000 34.898915 -16.851358 5
4 632.714991 7.374164 1.541343 19.333333 158.333333 -24046333.333333 907393166.666667 16.270000 125797901.323333 123.049240 35.355736 6
5 95.281515 14.717580 1.814754 25.954545 308.909091 645568272.727273 871490181.818182 2.006364 730848546.662727 57.950455 7.992920 22
6 37.282919 -14.529500 2.820301 40.666667 47.555556 -133624777.777778 -1904442925.925926 -4.957037 503635899.112593 86.787432 1.378738 27
In [74]:
plt.figure(figsize=(20, 35))
plt.suptitle("Boxplot of scaled numerical variables for each cluster", fontsize=20)

for i, variable in enumerate(num_cols):
    plt.subplot(5, 3, i + 1)
    sns.boxplot(data=scaled_df1, x="K_means_segments", y=variable)

plt.tight_layout(pad=2.0)
In [75]:
plt.figure(figsize=(20, 35))
plt.suptitle("Boxplot of original numerical variables for each cluster", fontsize=20)

for i, variable in enumerate(num_cols):
    plt.subplot(5, 3, i + 1)
    sns.boxplot(data=df1, x="K_means_segments", y=variable)

plt.tight_layout(pad=2.0)

Insights:¶

  • Cluster 0:

    • Highest average for Net Cashflow and Net Income
    • Highest average for Number of outstanding shares
    • Volatility is low
  • Cluster 1:

    • No stand out values
    • All distributions are average compared to other clusters
    • Majority of entries fall into this cluster
  • Cluster 2:

    • Highest values for volatility and ROE
    • Minimum average for Net Cashflow and Net Income
  • Cluster 3:

    • Lowest values for Cash ratio and P/B ratio
    • Volatility is low
  • Cluster 4:

    • Highest Current Price and Earning per share
    • P/E and P/B ratios are highest
    • Lowest average in estimated outstanding shares
  • Cluster 5:

    • Highest Price change and Cash ratio
  • Cluster 6:

    • Lowest in Price change

Hierarchical Clustering¶

In [76]:
# list of distance metrics
distance_metrics = ["euclidean", "chebyshev", "mahalanobis", "cityblock"]

# list of linkage methods
linkage_methods = ["single", "complete", "average", "weighted"]

high_cophenet_corr = 0
high_dm_lm = [0, 0]

for dm in distance_metrics:
    for lm in linkage_methods:
        Z = linkage(scaled_df, metric=dm, method=lm)
        c, coph_dists = cophenet(Z, pdist(scaled_df))
        print(
            "Cophenetic correlation for {} distance and {} linkage is {}.".format(
                dm.capitalize(), lm, c
            )
        )
        if high_cophenet_corr < c:
            high_cophenet_corr = c
            high_dm_lm[0] = dm
            high_dm_lm[1] = lm
Cophenetic correlation for Euclidean distance and single linkage is 0.9232271494002922.
Cophenetic correlation for Euclidean distance and complete linkage is 0.7873280186580672.
Cophenetic correlation for Euclidean distance and average linkage is 0.9422540609560814.
Cophenetic correlation for Euclidean distance and weighted linkage is 0.8693784298129404.
Cophenetic correlation for Chebyshev distance and single linkage is 0.9062538164750717.
Cophenetic correlation for Chebyshev distance and complete linkage is 0.598891419111242.
Cophenetic correlation for Chebyshev distance and average linkage is 0.9338265528030499.
Cophenetic correlation for Chebyshev distance and weighted linkage is 0.9127355892367.
Cophenetic correlation for Mahalanobis distance and single linkage is 0.925919553052459.
Cophenetic correlation for Mahalanobis distance and complete linkage is 0.7925307202850002.
Cophenetic correlation for Mahalanobis distance and average linkage is 0.9247324030159736.
Cophenetic correlation for Mahalanobis distance and weighted linkage is 0.8708317490180428.
Cophenetic correlation for Cityblock distance and single linkage is 0.9334186366528574.
Cophenetic correlation for Cityblock distance and complete linkage is 0.7375328863205818.
Cophenetic correlation for Cityblock distance and average linkage is 0.9302145048594667.
Cophenetic correlation for Cityblock distance and weighted linkage is 0.731045513520281.
In [77]:
#Highest cophenetic correlation
print(
    "Highest cophenetic correlation is {}, with {} distance and {} linkage.".format(
        high_cophenet_corr, high_dm_lm[0].capitalize(), high_dm_lm[1]
    )
)
Highest cophenetic correlation is 0.9422540609560814, with Euclidean distance and average linkage.

The highest value for cophenetic correlation is obtained with Euclidean distance and average linkage. We will now explore Euclidean in more detail with centroid and ward linkage methods.

In [78]:
# list of linkage methods
linkage_methods = ["single", "complete", "average", "centroid", "ward", "weighted"]

high_cophenet_corr = 0
high_dm_lm = [0, 0]

for lm in linkage_methods:
    Z = linkage(scaled_df, metric="euclidean", method=lm)
    c, coph_dists = cophenet(Z, pdist(scaled_df))
    print("Cophenetic correlation for {} linkage is {}.".format(lm, c))
    if high_cophenet_corr < c:
        high_cophenet_corr = c
        high_dm_lm[0] = "euclidean"
        high_dm_lm[1] = lm
Cophenetic correlation for single linkage is 0.9232271494002922.
Cophenetic correlation for complete linkage is 0.7873280186580672.
Cophenetic correlation for average linkage is 0.9422540609560814.
Cophenetic correlation for centroid linkage is 0.9314012446828154.
Cophenetic correlation for ward linkage is 0.7101180299865353.
Cophenetic correlation for weighted linkage is 0.8693784298129404.
In [79]:
#Highest cophenetic correlation
print(
    "Highest cophenetic correlation is {}, with {} linkage.".format(
        high_cophenet_corr, high_dm_lm[1]
    )
)
Highest cophenetic correlation is 0.9422540609560814, with average linkage.

Again average linkage gave the highest value for cophenetic correlation. We will now explore the dendograms to better visualize the clustering and grouping.

Dendograms¶

In [80]:
# Linkage methods for euclidean distance
linkage_methods = ["single", "complete", "average", "centroid", "ward", "weighted"]

# lists to save results of cophenetic correlation calculation
compare_cols = ["Linkage", "Cophenetic Coefficient"]

fig, axs = plt.subplots(len(linkage_methods), 1, figsize=(15, 30))

for i, method in enumerate(linkage_methods): #Plot each linkage method
    Z = linkage(scaled_df, metric="euclidean", method=method)

    dendrogram(Z, ax=axs[i])
    axs[i].set_title(f"Dendrogram ({method.capitalize()} Linkage)")

    coph_corr, coph_dist = cophenet(Z, pdist(scaled_df))
    axs[i].annotate(
        f"Cophenetic\nCorrelation\n{coph_corr:0.2f}",
        (0.80, 0.80),
        xycoords="axes fraction",
    )

Average linkage has the highest value for cophenetic correlation (0.94) and gives decent grouping in the dendogram. Ward linkage gives more clear separation, but has a lower score of 0.71. We will proceed with ward linkage, as it gives more distinction between the clusters.

Cluster Profile¶

In [81]:
HCmodel = AgglomerativeClustering(n_clusters=4, affinity="euclidean", linkage= "ward")
HCmodel.fit(scaled_df)
Out[81]:
AgglomerativeClustering(affinity='euclidean', n_clusters=4)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
AgglomerativeClustering(affinity='euclidean', n_clusters=4)
In [82]:
scaled_df1["HC_Clusters"] = HCmodel.labels_
df1["HC_Clusters"] = HCmodel.labels_
data1["HC_Clusters"] = HCmodel.labels_
In [83]:
cluster_profile_hc = df1.groupby("HC_Clusters").mean()
In [84]:
cluster_profile_hc["count_in_each_segments"] = (
    df1.groupby("HC_Clusters")["Current Price"].count().values
)
In [85]:
# lets display cluster profile
cluster_profile_hc.style.highlight_max(color="lightblue", axis=0)
Out[85]:
  Current Price Price Change Volatility ROE Cash Ratio Net Cash Flow Net Income Earnings Per Share Estimated Shares Outstanding P/E Ratio P/B Ratio K_means_segments count_in_each_segments
HC_Clusters                          
0 48.006208 -11.263107 2.590247 196.551724 40.275862 -495901724.137931 -3597244655.172414 -8.689655 486319827.294483 75.110924 -2.162622 5.068966 29
1 326.198218 10.563242 1.642560 14.400000 309.466667 288850666.666667 864498533.333333 7.785333 544900261.301333 113.095334 19.142151 4.666667 15
2 42.848182 6.270446 1.123547 22.727273 71.454545 558636363.636364 14631272727.272728 3.410000 4242572567.290909 15.242169 -4.924615 0.000000 11
3 72.760400 5.213307 1.427078 25.603509 60.392982 79951512.280702 1538594322.807018 3.655351 446472132.228456 24.722670 -2.647194 1.277193 285
In [86]:
# lets display cluster profile
cluster_profile_hc.style.highlight_min(color="orange", axis=0)
Out[86]:
  Current Price Price Change Volatility ROE Cash Ratio Net Cash Flow Net Income Earnings Per Share Estimated Shares Outstanding P/E Ratio P/B Ratio K_means_segments count_in_each_segments
HC_Clusters                          
0 48.006208 -11.263107 2.590247 196.551724 40.275862 -495901724.137931 -3597244655.172414 -8.689655 486319827.294483 75.110924 -2.162622 5.068966 29
1 326.198218 10.563242 1.642560 14.400000 309.466667 288850666.666667 864498533.333333 7.785333 544900261.301333 113.095334 19.142151 4.666667 15
2 42.848182 6.270446 1.123547 22.727273 71.454545 558636363.636364 14631272727.272728 3.410000 4242572567.290909 15.242169 -4.924615 0.000000 11
3 72.760400 5.213307 1.427078 25.603509 60.392982 79951512.280702 1538594322.807018 3.655351 446472132.228456 24.722670 -2.647194 1.277193 285

Checking sectors and sub industries in clusters

In [87]:
cluster_profile_hc_with_cat = data1.groupby("HC_Clusters").mean()
In [88]:
cluster_profile_hc_with_cat["count_in_each_segments"] = (
    data1.groupby("HC_Clusters")["Current Price"].count().values
)
In [93]:
for sector in data1["HC_Clusters"].unique():
    print("In cluster {}, the following sectors are present:".format(sector))
    print(data1[data1["HC_Clusters"] == sector]["GICS Sector"].unique())
    print()
In cluster 3, the following sectors are present:
['Industrials' 'Health Care' 'Information Technology' 'Consumer Staples'
 'Utilities' 'Financials' 'Real Estate' 'Materials'
 'Consumer Discretionary' 'Telecommunications Services' 'Energy']

In cluster 1, the following sectors are present:
['Information Technology' 'Health Care' 'Consumer Discretionary'
 'Real Estate' 'Telecommunications Services' 'Consumer Staples']

In cluster 0, the following sectors are present:
['Industrials' 'Energy' 'Consumer Discretionary' 'Consumer Staples'
 'Materials' 'Financials' 'Information Technology']

In cluster 2, the following sectors are present:
['Financials' 'Consumer Discretionary' 'Information Technology'
 'Consumer Staples' 'Health Care' 'Telecommunications Services' 'Energy']

Health care, telecommunication services appears in all clusters besides 0.

Industrials, financials appears in cluster 3 and 0.

IT, Consumer staples, consumer discretionay appears in all clusters.

Utilities, materials appears only in cluster 3.

Real estate appears in cluster 3 and 1.

Energy appears in all clusters besides 1.

In [95]:
for sub_sector in data1["HC_Clusters"].unique():
    print("In cluster {}, the following sub sectors are present:".format(sub_sector))
    print(data1[data1["HC_Clusters"] == sub_sector]["GICS Sub Industry"].unique())
    print()
In cluster 3, the following sub sectors are present:
['Airlines' 'Pharmaceuticals' 'Health Care Equipment'
 'Application Software' 'Semiconductors' 'Agricultural Products'
 'MultiUtilities' 'Electric Utilities' 'Life & Health Insurance'
 'Property & Casualty Insurance' 'REITs' 'Multi-line Insurance'
 'Insurance Brokers' 'Internet Software & Services' 'Specialty Chemicals'
 'Semiconductor Equipment' 'Electrical Components & Equipment'
 'Asset Management & Custody Banks' 'Specialized REITs' 'Specialty Stores'
 'Managed Health Care' 'Electronic Components' 'Aerospace & Defense'
 'Home Entertainment Software' 'Residential REITs' 'Water Utilities'
 'Consumer Finance' 'Banks' 'Biotechnology' 'Metal & Glass Containers'
 'Health Care Distributors' 'Auto Parts & Equipment'
 'Construction & Farm Machinery & Heavy Trucks' 'Real Estate Services'
 'Hotels, Resorts & Cruise Lines' 'Fertilizers & Agricultural Chemicals'
 'Regional Banks' 'Household Products' 'Air Freight & Logistics'
 'Financial Exchanges & Data' 'Industrial Machinery'
 'Health Care Supplies' 'Railroads'
 'Integrated Telecommunications Services' 'IT Consulting & Other Services'
 'Drug Retail' 'Integrated Oil & Gas' 'Diversified Chemicals'
 'Health Care Facilities' 'Industrial Conglomerates'
 'Broadcasting & Cable TV' 'Cable & Satellite'
 'Research & Consulting Services' 'Soft Drinks'
 'Oil & Gas Exploration & Production' 'Investment Banking & Brokerage'
 'Internet & Direct Marketing Retail' 'Building Products'
 'Electronic Equipment & Instruments' 'Diversified Commercial Services'
 'Retail REITs' 'Automobile Manufacturers' 'Consumer Electronics'
 'Tires & Rubber' 'Industrial Materials' 'Oil & Gas Equipment & Services'
 'Leisure Products' 'Motorcycle Manufacturers'
 'Technology Hardware, Storage & Peripherals' 'Computer Hardware'
 'Packaged Foods & Meats' 'Paper Packaging' 'Advertising' 'Trucking'
 'Networking Equipment' 'Homebuilding' 'Distributors'
 'Multi-Sector Holdings' 'Alternative Carriers' 'Restaurants'
 'Diversified Financial Services' 'Home Furnishings'
 'Construction Materials' 'Tobacco'
 'Oil & Gas Refining & Marketing & Transportation'
 'Life Sciences Tools & Services' 'Gold' 'Steel'
 'Housewares & Specialties' 'Thrifts & Mortgage Finance'
 'Technology, Hardware, Software and Supplies' 'Personal Products'
 'Industrial Gases' 'Data Processing & Outsourced Services'
 'Human Resource & Employment Services' 'Office REITs' 'Brewers'
 'Publishing' 'Specialty Retail' 'Apparel, Accessories & Luxury Goods'
 'Household Appliances' 'Environmental Services' 'Casinos & Gaming']

In cluster 1, the following sub sectors are present:
['Data Processing & Outsourced Services' 'Biotechnology'
 'Internet & Direct Marketing Retail' 'Restaurants' 'REITs'
 'Internet Software & Services' 'Integrated Telecommunications Services'
 'Health Care Equipment' 'Soft Drinks' 'Health Care Distributors']

In cluster 0, the following sub sectors are present:
['Building Products' 'Oil & Gas Exploration & Production'
 'Oil & Gas Equipment & Services' 'Integrated Oil & Gas'
 'Cable & Satellite' 'Household Products' 'Copper'
 'Oil & Gas Refining & Marketing & Transportation'
 'Diversified Financial Services' 'Application Software']

In cluster 2, the following sub sectors are present:
['Banks' 'Automobile Manufacturers' 'Semiconductors' 'Soft Drinks'
 'Pharmaceuticals' 'Integrated Telecommunications Services'
 'Integrated Oil & Gas']

Looking at boxplots for cluster distribution

In [90]:
plt.figure(figsize=(20, 35))
plt.suptitle("Boxplot of scaled numerical variables for each cluster", fontsize=20)

for i, variable in enumerate(num_cols):
    plt.subplot(5, 3, i + 1)
    sns.boxplot(data=scaled_df1, x="HC_Clusters", y=variable)

plt.tight_layout(pad=2.0)
In [91]:
plt.figure(figsize=(20, 35))
plt.suptitle("Boxplot of original numerical variables for each cluster", fontsize=20)

for i, variable in enumerate(num_cols):
    plt.subplot(5, 3, i + 1)
    sns.boxplot(data=df1, x="HC_Clusters", y=variable)

plt.tight_layout(pad=2.0)

Insights:¶

  • Cluster 0:

    • Highest volatility and ROE on average
    • Lowest price change, cash ratio, cash flow, net income and estimated share
    • Large variance in volatility and ROE
  • Cluster 1:

    • Highest Current price, price change, cash ratio and earning per share
    • P/E and P/B ratio are the highest on average
    • ROE is the lowest on average for the clusters
  • Cluster 2:

    • Highest Net Cashflow, Net income and estimated shares outstanding
    • Lowest Current price, volatility, P/E, and P/B ratio
  • Cluster 3:

    • No stand out values, all columns around average to other clusters
    • Distribution average in comparison
    • Holds majority of entries

K-means vs Hierarchical Clustering¶

You compare several things, like:

  • Which clustering technique took less time for execution?
  • Which clustering technique gave you more distinct clusters, or are they the same?
  • How many observations are there in the similar clusters of both algorithms?
  • How many clusters are obtained as the appropriate number of clusters from both algorithms?

You can also mention any differences or similarities you obtained in the cluster profiles from both the clustering techniques.

The clustering techniques were similar in execution time, but the K-means clustering did have more subjectivity. The elbow plot and silhouette plots did not give a clear choice for number of clusters. Hierarchical Clustering, on the other hand, was able to compare distance metrics and linkage methods and return the best performing model. The dendograms were slighlty easier to choose an appropriate number of clusters. Specifically the ward method gave much more distinct clusters.

For K-means clustering:

There are 7 clusters, with one having 264 observations. There are 3 clusters with only 3, 5 and 6 observations. There are 3 other clusters with 13, 22 and 27 observations. The clusters each are unique by containing the highest or lowest values on average across the fields. The only cluster to not show this feature is Cluster 1 which holds the majority of observations and is represented as average across the fields.

For Hiearchical Clustering:

There are 4 clusters, with one having 285 observations. There are 3 other clusters with 11, 15 and 29 observations. The other clusters can be represented by either max or min values across the fields. Cluster 3 shows similarity in the major cluster from K-means in that the distributions are relatively average when compared to other distributions across the fields.

Actionable Insights and Recommendations¶

  • Going forward we will discuss the data in clustering obtained with Hierarchical clustering with Euclidean distance and ward linkage.
  • Healthcare and Telecommunication services appeared in all clusters besides cluster 0 which shows poor performance. Recommend exploring these two sectors further.
  • Industrails and financials appeared in cluster 0 and 3 which suggest poor or average performance. Would not be an investment recommendation without further exploration.
  • Cluster 0:

    • High variance across distributions makes investments less stable. The cluster shows poor performance attributes earnings and price change. Recommend to avoid investing in companies that fall into this group.
  • Cluster 1:

    • Highest in current price, price change and earnings per share. The cluster indicates good performance metrics and positive trends. The initial buy in may be more expensive, but companies that fall into this group suggest a stable and good investment.
  • Cluster 2:

    • Lowest in current price. Highest in shares outstanding and net income. There is also low volatility and price change, but moving in a positive trend. Companies that fall into group 2 are cheaper to buy in, but indicate smaller returns. The positive trends indicate a good stable investment over time especially if stocks can be bought in bulk.
  • Cluster 3:

    • Cluster 3 holds the majority of observations and can be summarized as average. The distribution over the fields in comparison is average with outliers present in the boxplots. There are no stand out values that allow for further identification or trend prediction. At this current time, companies that fall into this group would not be an investment recommendation.
In [91]: